On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models Paper • 2512.07783 • Published Dec 8, 2025 • 36
DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle Paper • 2512.04324 • Published Dec 3, 2025 • 154
TableEval: A Real-World Benchmark for Complex, Multilingual, and Multi-Structured Table Question Answering Paper • 2506.03949 • Published Jun 4, 2025 • 1
RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation Paper • 2509.16198 • Published Sep 19, 2025 • 126
WideSearch: Benchmarking Agentic Broad Info-Seeking Paper • 2508.07999 • Published Aug 11, 2025 • 110
Reasoning-Table: Exploring Reinforcement Learning for Table Reasoning Paper • 2506.01710 • Published Jun 2, 2025 • 2