File size: 3,657 Bytes
a3805e9 89225da a3805e9 89225da a1dec5b a3805e9 a1dec5b f1fbf4d a1dec5b a3805e9 a1dec5b a3805e9 a1dec5b a3805e9 a1dec5b a3805e9 a1dec5b a3805e9 a1dec5b a3805e9 a1dec5b a3805e9 a1dec5b a3805e9 a1dec5b a3805e9 a1dec5b a3805e9 a1dec5b a3805e9 a1dec5b a3805e9 a1dec5b a3805e9 a1dec5b 72b5d8d a1dec5b 72b5d8d a1dec5b a3805e9 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 |
---
base_model: JetLM/SDAR-8B-Chat
language:
- en
- zh
license: apache-2.0
tags:
- math
- reasoning
- diffusion
model_type: sdar
pipeline_tag: text-generation
library_name: transformers
---
<h1 align="center">DiRL-8B-Instruct</h1>
<p align="center">
<a href="https://arxiv.org/abs/2512.22234">
<img src="https://img.shields.io/badge/arXiv-2512.22234-b31b1b.svg" alt="Paper on arXiv"/>
</a>
<a href="https://github.com/OpenMOSS/DiRL">
<img src="https://img.shields.io/badge/GitHub-Code-black.svg?logo=github" alt="GitHub Code"/>
</a>
</p>
## Introduction
**DiRL-8B-Instruct** is an 8B parameter diffusion language model specialized for mathematical reasoning. It is trained using the [DiRL](https://github.com/OpenMOSS/DiRL) framework based on [SDAR-8B-Chat](https://huggingface.co/JetLM/SDAR-8B-Chat). Through two-stage training (SFT + RL), DiRL-8B-Instruct achieves state-of-the-art results at the 8B scale on mathematical reasoning benchmarks, even outperforming 32B models on most tasks.
> **Highlights**
>
> * **SOTA Performance:** Achieves **83.05%** on MATH500, **20.63%** on AIME2024, and **20.83%** on AIME2025, surpassing all 8B baselines.
> * **Training Framework:** Trained with [DiRL](https://github.com/OpenMOSS/DiRL), an efficient training framework for diffusion language models.
> * **Strong Baseline:** Built on [SDAR-8B-Chat](https://huggingface.co/JetLM/SDAR-8B-Chat), gaining **+11.20%** on MATH500 and **+11.46%** on AIME2024.
## Inference
### Using LMDeploy
```python
from lmdeploy import pipeline, PytorchEngineConfig, GenerationConfig
from transformers import AutoTokenizer
model_path = "OpenMOSS-Team/DiRL-8B-Instruct"
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path)
# Prepare prompts
prompts = [
[{"role": "user", "content": "Solve: If x + 5 = 12, what is x?"}],
]
prompts = tokenizer.apply_chat_template(prompts, tokenize=False, add_generation_prompt=True)
# Configure backend for DLLM inference
backend_config = PytorchEngineConfig(
dtype="float16",
max_prefill_token_num=8192,
cache_max_entry_count=0.8,
dllm_block_length=4,
dllm_denoising_steps=4,
dllm_unmasking_strategy="low_confidence_dynamic",
dllm_confidence_threshold=0.9,
)
# Create inference pipeline
with pipeline(model_path, backend_config=backend_config) as pipe:
gen_config = GenerationConfig(
top_p=1.0,
top_k=50,
temperature=1.0,
do_sample=False, # greedy decoding
max_new_tokens=8192,
)
outputs = pipe(prompts, gen_config=gen_config)
for output in outputs:
print(output.text)
```
## Performance
| Model | MATH500 | GSM8K | AIME2024 | AIME2025 | OlympiadBench | Average |
|-------|---------|-------|----------|----------|---------------|---------|
| Qwen2.5-7B-Instruct | 73.78 | 89.78 | 8.96 | 5.63 | 36.58 | 42.95 |
| Qwen2.5-32B-Instruct | 81.13 | **94.03** | 12.92 | 11.88 | 45.65 | 49.12 |
| SDAR-8B-Chat | 71.85 | 89.87 | 9.17 | 9.38 | 36.03 | 43.26 |
| Trado-8B-Instruct | 75.59 | 91.06 | 11.67 | 15.00 | 40.32 | 46.73 |
| **DiRL-8B-Instruct** | **83.05** | 93.03 | **20.63** | **20.83** | **46.40** | **52.79** |
## Citation
If you use this model in your research, please cite:
```bibtex
@misc{zhu2025dirl,
title={DiRL: An Efficient Post-Training Framework for Diffusion Language Models},
author={Zhu, Ying and Wan, Jiaxin and Liu, Xiaoran and He, Siyanag and Wang, Qiqi and Guo, Xu and Liang, Tianyi and Huang, Zengfeng and He, Ziwei and Qiu, Xipeng},
year={2025},
eprint={2512.22234},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2512.22234}
}
``` |