F-GRPO Training for Qwen Math Reasoning
Training Qwen model using F-GRPO (Focal Loss-enhanced Group Relative Policy Optimization).
Based on Paper
"F-GRPO: Don't Let Your Policy Learn the Obvious and Forget the Rare" (Feb 2026)
Method
F-GRPO improves upon standard GRPO by:
- Down-weighting easy prompts (high success rate)
- Up-weighting hard prompts (low success rate)
- Using Focal Loss-inspired scaling:
advantage_fgrpo = (1 - success_rate)^ฮณ * advantage_grpo
Configuration
- Model: Qwen/Qwen2.5-3B (fits T4 Small 16GB)
- Dataset: DeepMath-103K or GSM8K
- Steps: 50 (minimal training, Raschka's finding)
- Hardware: T4 Small GPU (16GB VRAM)
Training Script
train_fgrpo_trl.py- Main training script using TRL GRPOTrainer
Results Expected
Based on paper (Qwen2.5-7B):
- pass@256: 64.1 โ 70.3 (+6.2 points)
- No extra compute cost
Usage
This Space automatically runs training on startup. Results are pushed to silentspuck/fgrpo-qwen-math-test.
- Downloads last month
- 5
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support