YOLO11n Text
A fine-tuned YOLO11n model for detecting text regions in images. This model is optimized for detecting text bounding boxes in documents, screenshots, UI interfaces, and natural scene images.
Model Description
This model is based on Ultralytics YOLO11n (nano variant) and has been fine-tuned specifically for text detection tasks. It detects text regions as bounding boxes, which can be used as input for OCR pipelines or UI automation tasks.
Model Architecture
- Base Model: YOLO11n (nano)
- Parameters: 2,590,035
- Layers: 181
- Input Size: 640x640
- Classes: 1 (text)
Training Details
Dataset
Training Configuration
| Parameter |
Value |
| Epochs |
50 |
| Batch Size |
16 |
| Image Size |
640 |
| Optimizer |
SGD (auto) |
| Learning Rate |
0.01 โ 0.0003 |
| Momentum |
0.937 |
| Weight Decay |
0.0005 |
| Warmup Epochs |
3.0 |
| AMP |
Enabled |
| Workers |
8 |
Augmentation
| Augmentation |
Value |
| HSV Hue |
0.015 |
| HSV Saturation |
0.7 |
| HSV Value |
0.4 |
| Translation |
0.1 |
| Scale |
0.5 |
| Horizontal Flip |
0.5 |
| Mosaic |
1.0 |
| Erasing |
0.4 |
| Auto Augment |
randaugment |
Hardware
- GPU: NVIDIA GeForce RTX 5070 Ti (16GB VRAM)
- Training Time: ~1.75 hours (6,267 seconds)
- Framework: Ultralytics 8.3.240, PyTorch 2.9.1+cu128
Performance Metrics
Final Results (Epoch 50)
| Metric |
Value |
| Precision |
95.7% |
| Recall |
93.6% |
| mAP@50 |
97.6% |
| mAP@50-95 |
81.8% |
| Box Loss |
0.619 |
| Class Loss |
0.376 |
| DFL Loss |
0.828 |
Training Progress
| Epoch |
mAP@50 |
mAP@50-95 |
Precision |
Recall |
| 1 |
89.1% |
64.3% |
86.0% |
82.7% |
| 10 |
95.9% |
76.8% |
93.5% |
90.7% |
| 20 |
96.9% |
79.5% |
94.8% |
92.0% |
| 30 |
97.3% |
80.8% |
95.1% |
93.1% |
| 40 |
97.6% |
81.5% |
95.6% |
93.5% |
| 50 |
97.6% |
81.8% |
95.7% |
93.6% |
Usage
Installation
pip install ultralytics
Inference
from ultralytics import YOLO
model = YOLO("best.pt")
results = model.predict(
source="image.jpg",
conf=0.25,
iou=0.7,
imgsz=640
)
for result in results:
boxes = result.boxes
for box in boxes:
xyxy = box.xyxy[0].tolist()
confidence = box.conf[0].item()
print(f"Text box: {xyxy}, confidence: {confidence:.2f}")
Batch Processing
from ultralytics import YOLO
from pathlib import Path
model = YOLO("best.pt")
results = model.predict(
source="path/to/images/",
conf=0.25,
save=True,
save_txt=True
)
Export to Other Formats
from ultralytics import YOLO
model = YOLO("best.pt")
model.export(format="onnx", imgsz=640, simplify=True)
model.export(format="engine", imgsz=640, half=True)
model.export(format="coreml", imgsz=640)
Model Files
| File |
Description |
best.pt |
Best checkpoint (highest mAP@50) |
args.yaml |
Training configuration |
results.csv |
Training metrics per epoch |
results.png |
Training curves visualization |
confusion_matrix.png |
Confusion matrix |
BoxPR_curve.png |
Precision-Recall curve |
Recommended Inference Parameters
| Parameter |
Recommended |
Description |
conf |
0.25 |
Confidence threshold |
iou |
0.7 |
NMS IoU threshold |
imgsz |
640-1024 |
Input image size |
max_det |
300 |
Maximum detections per image |
Use Cases
- OCR Preprocessing: Detect text regions before applying OCR
- Document Analysis: Locate text areas in scanned documents
- UI Automation: Find text elements in application screenshots
- Scene Text Detection: Detect text in natural images
- PDF Processing: Extract text region locations
Limitations
- Optimized for horizontal text; may have reduced accuracy on rotated text
- Trained primarily on document and UI images
- Single class (text) - does not distinguish between text types
- Best performance at 640px input size
Citation
@software{yolo11n_text,
author = {Ultralytics},
title = {YOLO11n Text},
year = {2024},
publisher = {HuggingFace},
url = {https://huggingface.co/datasets/DonkeySmall/Yolo-Text-Detection}
}
@software{ultralytics_yolo,
author = {Jocher, Glenn and Chaurasia, Ayush and Qiu, Jing},
title = {Ultralytics YOLO},
year = {2023},
publisher = {GitHub},
url = {https://github.com/ultralytics/ultralytics}
}
License
This model is released under the Apache 2.0 License.
Acknowledgments