view article Article The Open Evaluation Standard: Benchmarking NVIDIA Nemotron 3 Nano with NeMo Evaluator 16 days ago • 35
WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling Paper • 2512.14614 • Published 17 days ago • 67
World Models That Know When They Don't Know: Controllable Video Generation with Calibrated Uncertainty Paper • 2512.05927 • Published 28 days ago • 11
Rethinking Prompt Design for Inference-time Scaling in Text-to-Visual Generation Paper • 2512.03534 • Published about 1 month ago • 20
Skywork-R1V4: Toward Agentic Multimodal Intelligence through Interleaved Thinking with Images and DeepResearch Paper • 2512.02395 • Published Dec 2, 2025 • 47
CaptionQA: Is Your Caption as Useful as the Image Itself? Paper • 2511.21025 • Published Nov 26, 2025 • 27
World in a Frame: Understanding Culture Mixing as a New Challenge for Vision-Language Models Paper • 2511.22787 • Published Nov 27, 2025 • 9
Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks Paper • 2511.15065 • Published Nov 19, 2025 • 74
Large Language Models Meet Extreme Multi-label Classification: Scaling and Multi-modal Framework Paper • 2511.13189 • Published Nov 17, 2025 • 39
REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding Paper • 2511.13026 • Published Nov 17, 2025 • 25
Can World Simulators Reason? Gen-ViRe: A Generative Visual Reasoning Benchmark Paper • 2511.13853 • Published Nov 17, 2025 • 34
A Style is Worth One Code: Unlocking Code-to-Style Image Generation with Discrete Style Space Paper • 2511.10555 • Published Nov 13, 2025 • 60
Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models Paper • 2511.08577 • Published Nov 11, 2025 • 105
Agent READMEs: An Empirical Study of Context Files for Agentic Coding Paper • 2511.12884 • Published Nov 17, 2025 • 12
LoopTool: Closing the Data-Training Loop for Robust LLM Tool Calls Paper • 2511.09148 • Published Nov 12, 2025 • 16
Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds Paper • 2511.08892 • Published Nov 12, 2025 • 201
Adaptive Multi-Agent Response Refinement in Conversational Systems Paper • 2511.08319 • Published Nov 11, 2025 • 41