qwen3.5-hard-only-r4

Summary

  • Base model: Qwen/Qwen3.5-4B
  • Dataset: tytodd/qwen3.5-4b-v1
  • Checkpoint: tytodd/qwen3.5-hard-only-r4

OOD Evaluation

benchmark n auroc accuracy
arc_challenge 1000 0.8875 0.8890
judge_bench 278 0.7065 0.6583
mmlu 1000 0.7550 0.7680
mmlu_pro 1000 0.6889 0.7070
rod101_essay_scoring 81 0.7115 0.7407

MMLU AUROC with Tuning ( by percentage of data used to train)

accuracy_vs_train_n

accuracy_vs_train_pct

auroc_vs_train_pct

auroc_vs_train_n

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tytodd/qwen3.5-hard-only-r4

Finetuned
Qwen/Qwen3.5-4B
Adapter
(55)
this model