Watch Before You Answer: Learning from Visually Grounded Post-Training Paper • 2604.05117 • Published 6 days ago • 32
Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding Paper • 2604.05015 • Published 6 days ago • 225
Reinforcement Learning for Self-Improving Agent with Skill Library Paper • 2512.17102 • Published Dec 18, 2025 • 42
One-step Latent-free Image Generation with Pixel Mean Flows Paper • 2601.22158 • Published Jan 29 • 18
Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning Paper • 2601.06943 • Published Jan 11 • 215
YOLO-Master: MOE-Accelerated with Specialized Transformers for Enhanced Real-time Detection Paper • 2512.23273 • Published Dec 29, 2025 • 15
view article Article Transformers v5: Simple model definitions powering the AI ecosystem +2 Dec 1, 2025 • 307
MobileLLM-R1 Collection MobileLLM-R1, a series of sub-billion parameter reasoning models • 10 items • Updated Nov 21, 2025 • 30
VisionThink Collection Efficient Reasoning Vision Language Model • 7 items • Updated Jul 18, 2025 • 7
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents Paper • 2506.11763 • Published Jun 13, 2025 • 74
Sora Reference Papers Collection A collection of all papers referenced in OpenAI's "Video generation models as world simulators" technical report • openai.com/sora • 30 items • Updated 29 days ago • 52
view article Article Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models +1 Jun 24, 2024 • 207