YOLO11n Text

A fine-tuned YOLO11n model for detecting text regions in images. This model is optimized for detecting text bounding boxes in documents, screenshots, UI interfaces, and natural scene images.

Model Description

This model is based on Ultralytics YOLO11n (nano variant) and has been fine-tuned specifically for text detection tasks. It detects text regions as bounding boxes, which can be used as input for OCR pipelines or UI automation tasks.

Model Architecture

Base Model: YOLO11n (nano)
Parameters: 2,590,035
Layers: 181
Input Size: 640x640
Classes: 1 (text)

Training Details

Dataset

Source: DonkeySmall/Yolo-Text-Detection
Training Images: 22,661
Validation Images: 2,518
Total Images: 25,179
Format: YOLO (normalized xywh)

Training Configuration

Parameter	Value
Epochs	50
Batch Size	16
Image Size	640
Optimizer	SGD (auto)
Learning Rate	0.01 → 0.0003
Momentum	0.937
Weight Decay	0.0005
Warmup Epochs	3.0
AMP	Enabled
Workers	8

Augmentation

Augmentation	Value
HSV Hue	0.015
HSV Saturation	0.7
HSV Value	0.4
Translation	0.1
Scale	0.5
Horizontal Flip	0.5
Mosaic	1.0
Erasing	0.4
Auto Augment	randaugment

Hardware

GPU: NVIDIA GeForce RTX 5070 Ti (16GB VRAM)
Training Time: ~1.75 hours (6,267 seconds)
Framework: Ultralytics 8.3.240, PyTorch 2.9.1+cu128

Performance Metrics

Final Results (Epoch 50)

Metric	Value
Precision	95.7%
Recall	93.6%
mAP@50	97.6%
mAP@50-95	81.8%
Box Loss	0.619
Class Loss	0.376
DFL Loss	0.828

Training Progress

Epoch	mAP@50	mAP@50-95	Precision	Recall
1	89.1%	64.3%	86.0%	82.7%
10	95.9%	76.8%	93.5%	90.7%
20	96.9%	79.5%	94.8%	92.0%
30	97.3%	80.8%	95.1%	93.1%
40	97.6%	81.5%	95.6%	93.5%
50	97.6%	81.8%	95.7%	93.6%

Usage

Installation

pip install ultralytics

Inference

from ultralytics import YOLO

# Load the model
model = YOLO("best.pt")

# Run inference
results = model.predict(
    source="image.jpg",
    conf=0.25,
    iou=0.7,
    imgsz=640
)

# Process results
for result in results:
    boxes = result.boxes
    for box in boxes:
        # Get bounding box coordinates (x1, y1, x2, y2)
        xyxy = box.xyxy[0].tolist()
        confidence = box.conf[0].item()
        print(f"Text box: {xyxy}, confidence: {confidence:.2f}")

Batch Processing

from ultralytics import YOLO
from pathlib import Path

model = YOLO("best.pt")

# Process folder of images
results = model.predict(
    source="path/to/images/",
    conf=0.25,
    save=True,  # Save annotated images
    save_txt=True  # Save YOLO format labels
)

Export to Other Formats

from ultralytics import YOLO

model = YOLO("best.pt")

# Export to ONNX
model.export(format="onnx", imgsz=640, simplify=True)

# Export to TensorRT (for NVIDIA GPUs)
model.export(format="engine", imgsz=640, half=True)

# Export to CoreML (for Apple devices)
model.export(format="coreml", imgsz=640)

Model Files

File	Description
`best.pt`	Best checkpoint (highest mAP@50)
`args.yaml`	Training configuration
`results.csv`	Training metrics per epoch
`results.png`	Training curves visualization
`confusion_matrix.png`	Confusion matrix
`BoxPR_curve.png`	Precision-Recall curve

Recommended Inference Parameters

Parameter	Recommended	Description
`conf`	0.25	Confidence threshold
`iou`	0.7	NMS IoU threshold
`imgsz`	640-1024	Input image size
`max_det`	300	Maximum detections per image

Use Cases

OCR Preprocessing: Detect text regions before applying OCR
Document Analysis: Locate text areas in scanned documents
UI Automation: Find text elements in application screenshots
Scene Text Detection: Detect text in natural images
PDF Processing: Extract text region locations

Limitations

Optimized for horizontal text; may have reduced accuracy on rotated text
Trained primarily on document and UI images
Single class (text) - does not distinguish between text types
Best performance at 640px input size

Citation

@software{yolo11n_text,
  author = {Ultralytics},
  title = {YOLO11n Text},
  year = {2024},
  publisher = {HuggingFace},
  url = {https://huggingface.co/datasets/DonkeySmall/Yolo-Text-Detection}
}

@software{ultralytics_yolo,
  author = {Jocher, Glenn and Chaurasia, Ayush and Qiu, Jing},
  title = {Ultralytics YOLO},
  year = {2023},
  publisher = {GitHub},
  url = {https://github.com/ultralytics/ultralytics}
}

License

This model is released under the Apache 2.0 License.

Acknowledgments

Ultralytics for the YOLO11 architecture
DonkeySmall for the training dataset
HuggingFace for model hosting

Downloads last month: 40

Dataset used to train RoyRud1902/yolo11n-text

Evaluation results

Precision on YOLO Text Detection
validation set self-reported

0.957
Recall on YOLO Text Detection
validation set self-reported

0.936
mAP@50 on YOLO Text Detection
validation set self-reported

0.976
mAP@50-95 on YOLO Text Detection
validation set self-reported

0.818