Beyond the Current Observation: Evaluating Multimodal Large Language Models in Controllable Non-Markov Games Paper • 2606.19338 • Published 3 days ago • 43
OVO-S-Bench: A Hierarchical Benchmark for Streaming Spatial Intelligence in Multimodal LLMs Paper • 2606.03890 • Published 18 days ago • 31
COLLEAGUE.SKILL: Automated AI Skill Generation via Expert Knowledge Distillation Paper • 2605.31264 • Published 22 days ago • 115
SetCon: Towards Open-Ended Referring Segmentation via Set-Level Concept Prediction Paper • 2605.20110 • Published May 19 • 4
WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation Paper • 2605.10912 • Published May 11 • 46
Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale Paper • 2603.25040 • Published Mar 26 • 133