BLEUBERI: BLEU is a surprisingly effective reward for instruction following Paper • 2505.11080 • Published May 16, 2025 • 5
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution Paper • 2502.18449 • Published Feb 25, 2025 • 75