doclaynet-yolo26l — Document Layout Analysis

A YOLO26-Large model fine-tuned on DocLayNet v1.2 for Document Layout Analysis (DLA). It detects 10 semantic layout element types across diverse document styles (financial reports, patents, scientific papers, laws, manuals, and more).

🚀 Try it live: doclaynet-yolo26l-demo Space


⚠️ Class Index Note

The docling-project/DocLayNet-v1.2 HuggingFace dataset uses 1-indexed category_id values (1=Caption … 11=Title). The training script stored these directly as YOLO label indices without subtracting 1. As a result:

  • YOLO class 0 (unknown) — never appeared in any label; output is unreliable
  • YOLO class 1–10 → Caption, Footnote, Formula, List-item, Page-footer, Page-header, Picture, Section-header, Table, Text
  • Title (category_id=11) — was skipped during dataset preparation; this checkpoint does not detect Titles

The model weights, class names in the checkpoint, and this card have all been corrected to reflect the true mapping. Retraining with the fixed prepare_dataset.py (which now subtracts 1) will recover the Title class.


Model Details

Property Value
Architecture YOLO26-Large
Parameters 24.8M
GFLOPs 86.1
Input size 1024 × 1024
Optimizer AdamW
Epochs trained 20 (best at epoch 17)
Batch size 80
Training device NVIDIA H200 NVL

Test Set Evaluation (DocLayNet v1.2 — test split, 4 999 images)

Evaluated with conf=0.001, iou=0.5, imgsz=1024. Class names corrected to reflect actual training data mapping.

Overall Metrics

Metric Value
mAP@50 0.9152
mAP@50-95 0.7806
Precision 0.8877
Recall 0.8438
F1 (mean) 0.8652

Per-Class AP@50 (corrected class names)

Class AP@50 Precision Recall F1
Text 0.9581 0.9157 0.9043 0.9100
List-item 0.9510 0.9042 0.8919 0.8981
Page-footer 0.9396 0.9111 0.8696 0.8898
Formula 0.9255 0.8811 0.8693 0.8752
Section-header 0.9234 0.8750 0.8577 0.8663
Page-header 0.9225 0.9058 0.7855 0.8414
Table 0.9129 0.9079 0.8523 0.8792
Caption 0.8901 0.8636 0.7939 0.8273
Footnote 0.8859 0.9066 0.8114 0.8563
Picture 0.8433 0.8056 0.8020 0.8038
Title not trained

Classes

YOLO ID Class Notes
0 unknown Never in training labels
1 Caption
2 Footnote
3 Formula
4 List-item
5 Page-footer
6 Page-header
7 Picture
8 Section-header
9 Table
10 Text
Title Skipped in this checkpoint; retrain with fixed prepare_dataset.py

Training Configuration

Parameter Value
Base model yolo26l.pt
Dataset DocLayNet v1.2
Epochs 20
Patience 7
Batch size 80
Image size 1024
Optimizer AdamW
AMP True
Augmentation RandAugment + erasing

Usage

from ultralytics import YOLO

model = YOLO("tuandunghcmut/doclaynet-yolo26l")

results = model.predict(
    source="your_document.jpg",
    imgsz=1024,
    conf=0.25,
    iou=0.7,
    line_width=1,
)
results[0].show()

Citation

@article{doclaynet2022,
  title     = {DocLayNet: A Large Human-Annotated Dataset for Document-Layout Segmentation},
  author    = {Pfitzmann, Birgit and Auer, Christoph and Dolfi, Michele and Nassar, Ahmed S and Staar, Peter},
  booktitle = {Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining},
  year      = {2022}
}
Downloads last month
16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train tuandunghcmut/doclaynet-yolo26l

Space using tuandunghcmut/doclaynet-yolo26l 1

Evaluation results