Image-to-Text
Transformers
Safetensors
Khasi
English
DeepseekOCR2
feature-extraction
ocr
khasi
deepseek
vision
multimodal
unsloth
lora
handwriting-recognition
document-understanding
custom_code
Instructions to use toiar/Dak-OCR-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use toiar/Dak-OCR-v1 with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "image-to-text" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("image-to-text", model="toiar/Dak-OCR-v1", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("toiar/Dak-OCR-v1", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- Unsloth Studio new
How to use toiar/Dak-OCR-v1 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for toiar/Dak-OCR-v1 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for toiar/Dak-OCR-v1 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for toiar/Dak-OCR-v1 to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="toiar/Dak-OCR-v1", max_seq_length=2048, )
Dak-OCR v1: Khasi Optical Character Recognition model
Dak-OCR is a fine-tuned version of DeepSeek-OCR-2 designed for accurate OCR, Document Understanding, and Handwriting Recognition in the Khasi Language.
It was trained on the custom Khasi-OCR-36K dataset to reduce hallucination and repetition issues often seen in base multimodal models when working with low-resource languages. The model is designed to preserve document structure and produce clean Markdown output.
Model Highlights
- Language Support: Native Khasi (using Latin script with special characters like ï and ñ) and English.
- Task: Specialized for "Free OCR" (transcribing document images into Markdown-formatted text, preserving headings, paragraphs, lists, and tables).
- Robustness: Highly resilient to degraded, noisy, and historical scans.
- Base Model:
unsloth/DeepSeek-OCR-2 - Hardware & Performance: Training was conducted on an NVIDIA A100 SXM (80GB VRAM / 16 vCPU) leveraging native
bfloat16precision for optimal quality and memory efficiency. - LoRA Setup: Adaptation was implemented through LoRA with a rank of 64 (r=64) and lora_alpha=128. LoRA layers were applied broadly to linear components across vision encoders, language layers, attention blocks, and MLP modules to enable strong task specialization.
- Vision Processing: To effectively handle large or dense document pages, the model uses dynamic high-resolution multi-patch cropping (crop_mode=True) with a base resolution of 1024 and image size of 768, preventing loss of detail from aggressive downscaling.
- Model Precision: The final model is provided in native bfloat16 precision.
Performance & Evaluation
The model was evaluated on a mixed set of 40 highly dense Khasi samples containing complex markdown and degraded/noisy scans.
EVALUATION RESULTS
| Metric | Score |
|---|---|
| WER | 1.71% |
| CER | 0.91% |
Usage
from unsloth import FastVisionModel
from transformers import AutoModel
import torch
# Load Model
model, tokenizer = FastVisionModel.from_pretrained(
"toiar/Dak-OCR",
load_in_4bit = False,
auto_model = AutoModel,
trust_remote_code = True,
torch_dtype = torch.bfloat16,
)
FastVisionModel.for_inference(model)
model.generation_config.do_sample = False
model.generation_config.temperature = None
model.generation_config.top_p = None
# Inference
prompt = "<image>\nFree OCR."
image_path = "path/to/your/khasi_document.png"
with torch.no_grad():
output = model.infer(
tokenizer,
prompt=prompt,
image_file=image_path,
base_size=1024,
image_size=768,
crop_mode=True
)
print(output)
- Downloads last month
- 9
Model tree for toiar/Dak-OCR-v1
Dataset used to train toiar/Dak-OCR-v1
Viewer • Updated • 36.8k • 51