16 27

firas snake

abol3z

AI & ML interests

None yet

Recent Activity

liked a dataset about 1 month ago

nvidia/miracl-vision

upvoted a paper about 1 month ago

Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction

liked a Space about 1 month ago

Tevatron/BrowseComp-Plus

View all activity

Organizations

None yet

liked a dataset about 1 month ago

nvidia/miracl-vision

Viewer • Updated May 20, 2025 • 695k • 238 • 12

upvoted a paper about 1 month ago

Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction

Paper • 2605.05242 • Published May 3 • 126

liked a Space about 1 month ago

BrowseComp-Plus

🔍

Fair and Disentangled Evaluation of Deep-Research Agents

liked 2 models 6 months ago

asafaya/bert-base-arabic

Fill-Mask • 0.1B • Updated Mar 17, 2023 • 9.67k • • 40

TomoroAI/tomoro-colqwen3-embed-4b

Visual Document Retrieval • 4B • Updated Dec 7, 2025 • 90.2k • 31

commented on Supercharge your OCR Pipelines with Open Models 8 months ago

@doladoo yes. I tried Paddle, Miner, Marker, OlmOCR, Chandra-OCR, Docling without VL.

Overall for Arabic, VLM approach showed better performance, and the best was OlmOCR.

Note that my documents are mostly scanned text and tables, nothing more.

commented on Supercharge your OCR Pipelines with Open Models 8 months ago

@doladoo both version 1 and 2
@merve from my testing, it didn't lose its multi lingual skills, in fact it made them much better. I tested both Qwen2.5-VL and OlmOCR and the latter is the best on arabic text.

commented on Supercharge your OCR Pipelines with Open Models 8 months ago

If only this came last week! I spent the last week learning about about and benchmarking all these plus extra models, and I wanna point out a correction. OlmOCR isn't an English language only model, in fact, it produced the best results across all VLM and none VLM frameworks on my Arabic language corpus.