GenRecal: Generation after Recalibration from Large to Small Vision-Language Models Paper • 2506.15681 • Published Jun 18 • 39
Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 • 11 items • Updated Jul 21 • 550
view article Article Introducing Training Cluster as a Service - a new collaboration with NVIDIA +1 Jun 11 • 26
Vision Language Models Papers 🖼️💬📝 Collection Papers about vision-language models, most important ones are on top of the list. • 27 items • Updated Apr 30, 2024 • 40
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics Paper • 2506.01844 • Published Jun 2 • 147
SmolVLA Collection Small, efficient and light-weight VLAs pretrained on community datasets • 1 item • Updated Sep 5 • 31
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation Paper • 2401.02117 • Published Jan 4, 2024 • 33
view article Article SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data +7 Jun 3 • 299
Multimodal DSE Retrievers Collection A collection of DSE models for multimodal retrieval • 5 items • Updated Apr 15 • 15
view article Article NVIDIA's GTC 2025 Announcement for Physical AI Developers: New Open Models and Datasets +3 Mar 18 • 42
view article Article π0 and π0-FAST: Vision-Language-Action Models for General Robot Control +2 Feb 4 • 186