doclaynet-yolo26l — Document Layout Analysis
A YOLO26-Large model fine-tuned on DocLayNet v1.2 for Document Layout Analysis (DLA). It detects 10 semantic layout element types across diverse document styles (financial reports, patents, scientific papers, laws, manuals, and more).
🚀 Try it live: doclaynet-yolo26l-demo Space
⚠️ Class Index Note
The docling-project/DocLayNet-v1.2 HuggingFace dataset uses 1-indexed category_id values (1=Caption … 11=Title). The training script stored these directly as YOLO label indices without subtracting 1. As a result:
- YOLO class 0 (
unknown) — never appeared in any label; output is unreliable - YOLO class 1–10 → Caption, Footnote, Formula, List-item, Page-footer, Page-header, Picture, Section-header, Table, Text
- Title (category_id=11) — was skipped during dataset preparation; this checkpoint does not detect Titles
The model weights, class names in the checkpoint, and this card have all been corrected to reflect the true mapping. Retraining with the fixed prepare_dataset.py (which now subtracts 1) will recover the Title class.
Model Details
| Property | Value |
|---|---|
| Architecture | YOLO26-Large |
| Parameters | 24.8M |
| GFLOPs | 86.1 |
| Input size | 1024 × 1024 |
| Optimizer | AdamW |
| Epochs trained | 20 (best at epoch 17) |
| Batch size | 80 |
| Training device | NVIDIA H200 NVL |
Test Set Evaluation (DocLayNet v1.2 — test split, 4 999 images)
Evaluated with
conf=0.001,iou=0.5,imgsz=1024. Class names corrected to reflect actual training data mapping.
Overall Metrics
| Metric | Value |
|---|---|
| mAP@50 | 0.9152 |
| mAP@50-95 | 0.7806 |
| Precision | 0.8877 |
| Recall | 0.8438 |
| F1 (mean) | 0.8652 |
Per-Class AP@50 (corrected class names)
| Class | AP@50 | Precision | Recall | F1 |
|---|---|---|---|---|
| Text | 0.9581 | 0.9157 | 0.9043 | 0.9100 |
| List-item | 0.9510 | 0.9042 | 0.8919 | 0.8981 |
| Page-footer | 0.9396 | 0.9111 | 0.8696 | 0.8898 |
| Formula | 0.9255 | 0.8811 | 0.8693 | 0.8752 |
| Section-header | 0.9234 | 0.8750 | 0.8577 | 0.8663 |
| Page-header | 0.9225 | 0.9058 | 0.7855 | 0.8414 |
| Table | 0.9129 | 0.9079 | 0.8523 | 0.8792 |
| Caption | 0.8901 | 0.8636 | 0.7939 | 0.8273 |
| Footnote | 0.8859 | 0.9066 | 0.8114 | 0.8563 |
| Picture | 0.8433 | 0.8056 | 0.8020 | 0.8038 |
| Title | not trained | — | — | — |
Classes
| YOLO ID | Class | Notes |
|---|---|---|
| 0 | unknown | Never in training labels |
| 1 | Caption | |
| 2 | Footnote | |
| 3 | Formula | |
| 4 | List-item | |
| 5 | Page-footer | |
| 6 | Page-header | |
| 7 | Picture | |
| 8 | Section-header | |
| 9 | Table | |
| 10 | Text | |
| — | Title | Skipped in this checkpoint; retrain with fixed prepare_dataset.py |
Training Configuration
| Parameter | Value |
|---|---|
| Base model | yolo26l.pt |
| Dataset | DocLayNet v1.2 |
| Epochs | 20 |
| Patience | 7 |
| Batch size | 80 |
| Image size | 1024 |
| Optimizer | AdamW |
| AMP | True |
| Augmentation | RandAugment + erasing |
Usage
from ultralytics import YOLO
model = YOLO("tuandunghcmut/doclaynet-yolo26l")
results = model.predict(
source="your_document.jpg",
imgsz=1024,
conf=0.25,
iou=0.7,
line_width=1,
)
results[0].show()
Citation
@article{doclaynet2022,
title = {DocLayNet: A Large Human-Annotated Dataset for Document-Layout Segmentation},
author = {Pfitzmann, Birgit and Auer, Christoph and Dolfi, Michele and Nassar, Ahmed S and Staar, Peter},
booktitle = {Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining},
year = {2022}
}
- Downloads last month
- 16
Dataset used to train tuandunghcmut/doclaynet-yolo26l
Space using tuandunghcmut/doclaynet-yolo26l 1
Evaluation results
- mAP@50 on DocLayNet v1.2 (test split)test set self-reported0.915
- mAP@50-95 on DocLayNet v1.2 (test split)test set self-reported0.781
- F1 (mean) on DocLayNet v1.2 (test split)test set self-reported0.865