⚖️ FineWeb-Legal Ablation Studies
This repository contains ablation study results for the FineWeb-Legal classifier project. We systematically tested hyperparameters to identify the optimal configuration for legal document classification.
📊 Ablation Results
Sequence Length Impact
Legal documents are long. We tested context windows from 512 to 2048 tokens.
| Context Window | Macro F1 | Accuracy | Binary F1@3 | Impact |
|---|---|---|---|---|
| 512 | 0.5797 | 0.606 | 0.8534 | ❌ Too short |
| 1024 | 0.6645 | 0.721 | 0.9200 | Baseline |
| 2048 | 0.6715 | 0.742 | 0.9177 | ✅ Winner |
Conclusion: Increasing to 2048 tokens provides the most significant boost (+1.05% Macro F1).
Learning Rate Sweep
Tested from 1e-4 to 1e-3.
| LR | Macro F1 | Accuracy | Binary F1@3 | Notes |
|---|---|---|---|---|
| 1e-4 | 0.6548 | 0.708 | 0.9074 | Underfitting |
| 2e-4 | 0.6645 | 0.721 | 0.9200 | Stable |
| 5e-4 | 0.6655 | 0.725 | 0.9200 | Best |
| 1e-3 | 0.6644 | 0.719 | 0.9176 | Diminishing returns |
Conclusion: 3e-4 to 5e-4 is optimal.
LoRA Rank Analysis
Testing adapter capacity.
| Rank | Macro F1 | Accuracy | Binary F1@3 | Notes |
|---|---|---|---|---|
| 8 | 0.6406 | 0.688 | 0.9074 | Underfitting |
| 16 | 0.6645 | 0.721 | 0.9200 | Optimal |
| 32 | 0.6645 | 0.721 | 0.9200 | No gain, higher VRAM |
Conclusion: Rank 16 is the sweet spot.
Class Weights
| Configuration | Macro F1 | Accuracy | Binary F1@3 |
|---|---|---|---|
| No Weights | 0.6635 | 0.719 | 0.9200 |
| With Weights | 0.6645 | 0.721 | 0.9200 |
Conclusion: Class weights improve performance on imbalanced data.
🎯 Optimal Configuration
Based on these studies, the recommended V2 configuration is:
- Sequence Length: 2048 tokens
- Learning Rate: 3e-4
- LoRA Rank: 16
- Class Weights: Enabled
- Base Model:
google/embeddinggemma-300m
📁 Repository Structure
ablation_results/
├── lr_1e-4/ # Learning rate experiments
├── lr_2e-4/
├── lr_5e-4/
├── lr_1e-3/
├── lora_r8/ # LoRA rank experiments
├── lora_r16/
├── lora_r32/
├── seq_512/ # Sequence length experiments
├── seq_1024/
├── seq_2048/
├── with_weights/ # Class weight experiments
├── no_weights/
└── master_summary.json
Each experiment folder contains:
results.json- Performance metricsconfig.json- Hyperparameters usedmodel/- Trained LoRA adaptersconfusion_matrix.npy- Confusion matrixclassification_report.txt- Detailed metrics
🔗 Related Artifacts
- Filtered Dataset: FineWeb-Legal-Pilot
- Raw Annotations: FineWeb-Legal-Annotations
- Code & Documentation: GitHub Repository
📝 Methodology
All experiments used:
- Hardware: NVIDIA RTX 4090 (24GB VRAM)
- Training samples: 5,000 (stratified)
- Validation samples: 1,000 (stratified)
- Base model:
google/embeddinggemma-300m - Task: 6-class legal quality classification (0-5)
- Metric: Macro F1 Score
License
MIT License.
- Downloads last month
- 1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for NoeFlandre/fineweb-legal-classifier
Base model
google/embeddinggemma-300m