GRPO Fine-tuned Qwen2.5-0.5B for IIT-JEE Math

Model Description

This model is fine-tuned using Group Relative Policy Optimization (GRPO) on IIT-JEE mathematics datasets.

Training Details

Base Model: Qwen/Qwen2.5-0.5B-Instruct
Method: GRPO (Reinforcement Learning)
Datasets: JEEBench, JEE Main 2025, JEE-NEET Benchmark
LoRA Rank: 32
Training Epochs: 3

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("Yagna1/grpo-qwen2.5-0.5b-jee-math")
tokenizer = AutoTokenizer.from_pretrained("Yagna1/grpo-qwen2.5-0.5b-jee-math")

messages = [
    {"role": "system", "content": "You are a math solver. Solve step-by-step and provide your final answer in \\boxed{} format."},
    {"role": "user", "content": "What is the derivative of x^2?"}
]

inputs = tokenizer.apply_chat_template(messages, return_tensors="pt")
outputs = model.generate(inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0]))

Reward Functions

Format Reward (0.3): Ensures answers are in \boxed{} format
Correctness Reward (1.0): Validates mathematical accuracy
Length Reward (0.1): Encourages concise solutions (10-300 words)

Limitations

Optimized for IIT-JEE level mathematics
Best performance on algebra, calculus, and geometry problems
May require multiple generations for complex problems

Citation

@misc{grpo-jee-math-2024,
  author = {Yagna1},
  title = {GRPO Fine-tuned Qwen2.5 for IIT-JEE Math},
  year = {2024},
  publisher = {HuggingFace},
  url = {https://huggingface.co/Yagna1/grpo-qwen2.5-0.5b-jee-math}
}

Downloads last month: 5

Safetensors

Model size

0.5B params

Tensor type

BF16

Model tree for Yagna1/grpo-qwen2.5-0.5b-jee-math

Base model

Qwen/Qwen2.5-0.5B

Finetuned

Qwen/Qwen2.5-0.5B-Instruct

Finetuned

(647)

this model