Personalized Preference Fine-tuning of Diffusion Models Paper β’ 2501.06655 β’ Published Jan 11, 2025 β’ 1
Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models Paper β’ 2502.17387 β’ Published Feb 24, 2025 β’ 7
RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems Paper β’ 2510.02263 β’ Published Oct 2, 2025 β’ 9
SynthLabsAI/ALP_DeepScaleR_1.5B_C16K Reinforcement Learning β’ 2B β’ Updated Jun 24, 2025 β’ 3 β’ 3
Adaptive Length Penalty Collection Teaching language models to think efficiently with Adaptive Length Penalty (ALP) β’ 3 items β’ Updated Jun 24, 2025 β’ 1
SynthLabsAI/ALP_DeepScaleR_1.5B_C16K Reinforcement Learning β’ 2B β’ Updated Jun 24, 2025 β’ 3 β’ 3
Just Enough Thinking: Efficient Reasoning with Adaptive Length Penalties Reinforcement Learning Paper β’ 2506.05256 β’ Published Jun 5, 2025 β’ 2
Adaptive Length Penalty Collection Teaching language models to think efficiently with Adaptive Length Penalty (ALP) β’ 3 items β’ Updated Jun 24, 2025 β’ 1
OpenThoughts: Data Recipes for Reasoning Models Paper β’ 2506.04178 β’ Published Jun 4, 2025 β’ 54
The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text Paper β’ 2506.05209 β’ Published Jun 5, 2025 β’ 60
Learning Adaptive Parallel Reasoning with Language Models Paper β’ 2504.15466 β’ Published Apr 21, 2025 β’ 44
Big-Math Collection This collection contains assets associated with the Big-Math dataset, a high-quality collection of over 250,000 math questions with verifiable answers β’ 4 items β’ Updated Apr 16, 2025 β’ 7