doclaynet-yolo26l — Document Layout Analysis

A YOLO26-Large model fine-tuned on DocLayNet v1.2 for Document Layout Analysis (DLA). It detects 10 semantic layout element types across diverse document styles (financial reports, patents, scientific papers, laws, manuals, and more).

🚀 Try it live: doclaynet-yolo26l-demo Space

⚠️ Class Index Note

The docling-project/DocLayNet-v1.2 HuggingFace dataset uses 1-indexed category_id values (1=Caption … 11=Title). The training script stored these directly as YOLO label indices without subtracting 1. As a result:

YOLO class 0 (unknown) — never appeared in any label; output is unreliable
YOLO class 1–10 → Caption, Footnote, Formula, List-item, Page-footer, Page-header, Picture, Section-header, Table, Text
Title (category_id=11) — was skipped during dataset preparation; this checkpoint does not detect Titles

The model weights, class names in the checkpoint, and this card have all been corrected to reflect the true mapping. Retraining with the fixed prepare_dataset.py (which now subtracts 1) will recover the Title class.

Model Details

Property	Value
Architecture	YOLO26-Large
Parameters	24.8M
GFLOPs	86.1
Input size	1024 × 1024
Optimizer	AdamW
Epochs trained	20 (best at epoch 17)
Batch size	80
Training device	NVIDIA H200 NVL

Test Set Evaluation (DocLayNet v1.2 — test split, 4 999 images)

Evaluated with conf=0.001, iou=0.5, imgsz=1024. Class names corrected to reflect actual training data mapping.

Overall Metrics

Metric	Value
mAP@50	0.9152
mAP@50-95	0.7806
Precision	0.8877
Recall	0.8438
F1 (mean)	0.8652

Per-Class AP@50 (corrected class names)

Class	AP@50	Precision	Recall	F1
Text	0.9581	0.9157	0.9043	0.9100
List-item	0.9510	0.9042	0.8919	0.8981
Page-footer	0.9396	0.9111	0.8696	0.8898
Formula	0.9255	0.8811	0.8693	0.8752
Section-header	0.9234	0.8750	0.8577	0.8663
Page-header	0.9225	0.9058	0.7855	0.8414
Table	0.9129	0.9079	0.8523	0.8792
Caption	0.8901	0.8636	0.7939	0.8273
Footnote	0.8859	0.9066	0.8114	0.8563
Picture	0.8433	0.8056	0.8020	0.8038
Title	not trained	—	—	—

Classes

YOLO ID	Class	Notes
0	unknown	Never in training labels
1	Caption
2	Footnote
3	Formula
4	List-item
5	Page-footer
6	Page-header
7	Picture
8	Section-header
9	Table
10	Text
—	Title	Skipped in this checkpoint; retrain with fixed `prepare_dataset.py`

Training Configuration

Parameter	Value
Base model	yolo26l.pt
Dataset	DocLayNet v1.2
Epochs	20
Patience	7
Batch size	80
Image size	1024
Optimizer	AdamW
AMP	True
Augmentation	RandAugment + erasing

Usage

from ultralytics import YOLO

model = YOLO("tuandunghcmut/doclaynet-yolo26l")

results = model.predict(
    source="your_document.jpg",
    imgsz=1024,
    conf=0.25,
    iou=0.7,
    line_width=1,
)
results[0].show()

Citation

@article{doclaynet2022,
  title     = {DocLayNet: A Large Human-Annotated Dataset for Document-Layout Segmentation},
  author    = {Pfitzmann, Birgit and Auer, Christoph and Dolfi, Michele and Nassar, Ahmed S and Staar, Peter},
  booktitle = {Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining},
  year      = {2022}
}

tuandunghcmut
/

doclaynet-yolo26l