MAI-UI Technical Report: Real-World Centric Foundation GUI Agents Paper • 2512.22047 • Published 10 days ago • 26
InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion Paper • 2512.17504 • Published 17 days ago • 95
Cosmos-Reason1 Collection Multimodal world understanding through reasoning • 8 items • Updated 13 days ago • 38
view article Article SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data +7 Jun 3, 2025 • 305
view article Article LeRobot Community Datasets: The “ImageNet” of Robotics — When and How? +5 May 11, 2025 • 88
view article Article π0 and π0-FAST: Vision-Language-Action Models for General Robot Control +2 Feb 4, 2025 • 186
Android in the Wild: A Large-Scale Dataset for Android Device Control Paper • 2307.10088 • Published Jul 19, 2023 • 11
OpenMask3D: Open-Vocabulary 3D Instance Segmentation Paper • 2306.13631 • Published Jun 23, 2023 • 10