Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem Paper • 2512.24873 • Published 1 day ago • 28
Reasoning Palette: Modulating Reasoning via Latent Contextualization for Controllable Exploration for (V)LMs Paper • 2512.17206 • Published 13 days ago • 19
Attention Illuminates LLM Reasoning: The Preplan-and-Anchor Rhythm Enables Fine-Grained Policy Optimization Paper • 2510.13554 • Published Oct 15, 2025 • 57
Part II: ROLL Flash -- Accelerating RLVR and Agentic Training with Asynchrony Paper • 2510.11345 • Published Oct 13, 2025 • 15
Reinforcement Learning Optimization for Large-Scale Learning: An Efficient and User-Friendly Scaling Library Paper • 2506.06122 • Published Jun 6, 2025 • 7
Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning? Paper • 2502.19361 • Published Feb 26, 2025 • 28
ProgCo: Program Helps Self-Correction of Large Language Models Paper • 2501.01264 • Published Jan 2, 2025 • 26
Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models Paper • 2411.07140 • Published Nov 11, 2024 • 35
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework Paper • 2405.11143 • Published May 20, 2024 • 40