Diffusion Language Models are Super Data Learners Paper • 2511.03276 • Published Nov 5, 2025 • 128
Defeating the Training-Inference Mismatch via FP16 Paper • 2510.26788 • Published Oct 30, 2025 • 29
ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning Paper • 2510.27492 • Published Oct 30, 2025 • 82
Efficient Process Reward Model Training via Active Learning Paper • 2504.10559 • Published Apr 14, 2025 • 13
Language Models Can Learn from Verbal Feedback Without Scalar Rewards Paper • 2509.22638 • Published Sep 26, 2025 • 70
Fostering Video Reasoning via Next-Event Prediction Paper • 2505.22457 • Published May 28, 2025 • 29
Reinforcing General Reasoning without Verifiers Paper • 2505.21493 • Published May 27, 2025 • 26
Efficient Process Reward Model Training via Active Learning Paper • 2504.10559 • Published Apr 14, 2025 • 13
🚀 Active PRM Collection Efficient Process Reward Model Training via Active Learning. • 4 items • Updated Apr 16, 2025 • 3