NanoVDR: Distilling a 2B Vision-Language Retriever into a 70M Text-Only Encoder for Visual Document Retrieval Paper β’ 2603.12824 β’ Published 10 days ago β’ 5
Taking Shortcuts for Categorical VQA Using Super Neurons Paper β’ 2603.10781 β’ Published 11 days ago β’ 6
JavisDiT++: Unified Modeling and Optimization for Joint Audio-Video Generation Paper β’ 2602.19163 β’ Published 28 days ago β’ 14
DreamID-Omni: Unified Framework for Controllable Human-Centric Audio-Video Generation Paper β’ 2602.12160 β’ Published Feb 12 β’ 38
AdaGaR: Adaptive Gabor Representation for Dynamic Scene Reconstruction Paper β’ 2601.00796 β’ Published Jan 2 β’ 32
Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation Paper β’ 2601.00664 β’ Published Jan 2 β’ 57
NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos Paper β’ 2601.00393 β’ Published Jan 1 β’ 133
Omni-Attribute: Open-vocabulary Attribute Encoder for Visual Concept Personalization Paper β’ 2512.10955 β’ Published Dec 11, 2025 β’ 7
Efficiently Reconstructing Dynamic Scenes One D4RT at a Time Paper β’ 2512.08924 β’ Published Dec 9, 2025 β’ 21
Preserving Source Video Realism: High-Fidelity Face Swapping for Cinematic Quality Paper β’ 2512.07951 β’ Published Dec 8, 2025 β’ 51
RynnVLA-002: A Unified Vision-Language-Action and World Model Paper β’ 2511.17502 β’ Published Nov 21, 2025 β’ 28
Parrot: Persuasion and Agreement Robustness Rating of Output Truth -- A Sycophancy Robustness Benchmark for LLMs Paper β’ 2511.17220 β’ Published Nov 21, 2025 β’ 19
Benchmarking Diversity in Image Generation via Attribute-Conditional Human Evaluation Paper β’ 2511.10547 β’ Published Nov 13, 2025 β’ 5
UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist Paper β’ 2511.08521 β’ Published Nov 11, 2025 β’ 39
One Small Step in Latent, One Giant Leap for Pixels: Fast Latent Upscale Adapter for Your Diffusion Models Paper β’ 2511.10629 β’ Published Nov 13, 2025 β’ 129