Qwen3-4B teacher-regularized RL method collection for the ToolUse dataset. Includes GRPO-TR, RLSD-TR, SDPO-TR, and SRPO-TR models.
-
SeongryongJung/Qwen3-4B-Tooluse-GRPO-TR
Text Generation • 4B • Updated • 50 -
SeongryongJung/Qwen3-4B-Tooluse-RLSD-TR
Text Generation • 4B • Updated • 49 -
SeongryongJung/Qwen3-4B-ToolUse-SDPO-TR
Reinforcement Learning • 4B • Updated -
SeongryongJung/Qwen3-4B-ToolUse-SRPO-TR
Reinforcement Learning • 4B • Updated