NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale Paper • 2508.10711 • Published Aug 14 • 144
SpeakerVid-5M: A Large-Scale High-Quality Dataset for Audio-Visual Dyadic Interactive Human Generation Paper • 2507.09862 • Published Jul 14 • 49
Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning Paper • 2507.05255 • Published Jul 7 • 74
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time Paper • 2505.24863 • Published May 30 • 97
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation Paper • 2406.16855 • Published Jun 24, 2024 • 57
DreamLLM: Synergistic Multimodal Comprehension and Creation Paper • 2309.11499 • Published Sep 20, 2023 • 59