Spatia: Video Generation with Updatable Spatial Memory Paper • 2512.15716 • Published 11 days ago • 20
Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models Paper • 2512.20557 • Published 5 days ago • 46
N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models Paper • 2512.16561 • Published 10 days ago • 19
WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling Paper • 2512.14614 • Published 12 days ago • 65
Exploring MLLM-Diffusion Information Transfer with MetaCanvas Paper • 2512.11464 • Published 16 days ago • 12
V-RGBX: Video Editing with Accurate Controls over Intrinsic Properties Paper • 2512.11799 • Published 16 days ago • 29
EgoX: Egocentric Video Generation from a Single Exocentric Video Paper • 2512.08269 • Published 19 days ago • 111
EgoEdit: Dataset, Real-Time Streaming Model, and Benchmark for Egocentric Video Editing Paper • 2512.06065 • Published 23 days ago • 28
SIMA 2: A Generalist Embodied Agent for Virtual Worlds Paper • 2512.04797 • Published 24 days ago • 23
NeuralRemaster: Phase-Preserving Diffusion for Structure-Aligned Generation Paper • 2512.05106 • Published 24 days ago • 15
Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length Paper • 2512.04677 • Published 24 days ago • 168
GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization Paper • 2511.15705 • Published Nov 19 • 92
Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks Paper • 2511.15065 • Published Nov 19 • 74
A Style is Worth One Code: Unlocking Code-to-Style Image Generation with Discrete Style Space Paper • 2511.10555 • Published Nov 13 • 60
Part-X-MLLM: Part-aware 3D Multimodal Large Language Model Paper • 2511.13647 • Published Nov 17 • 70