File size: 3,657 Bytes
a3805e9
89225da
a3805e9
 
 
 
 
 
 
 
 
89225da
 
a1dec5b
 
a3805e9
a1dec5b
 
f1fbf4d
 
 
 
 
 
a1dec5b
 
a3805e9
a1dec5b
a3805e9
a1dec5b
a3805e9
 
 
 
 
a1dec5b
a3805e9
a1dec5b
a3805e9
a1dec5b
 
 
 
 
a3805e9
a1dec5b
a3805e9
 
a1dec5b
a3805e9
 
 
a1dec5b
a3805e9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a1dec5b
 
a3805e9
a1dec5b
a3805e9
 
 
 
 
 
 
a1dec5b
a3805e9
a1dec5b
a3805e9
a1dec5b
 
 
72b5d8d
 
a1dec5b
72b5d8d
 
 
 
a1dec5b
a3805e9
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
---
base_model: JetLM/SDAR-8B-Chat
language:
- en
- zh
license: apache-2.0
tags:
- math
- reasoning
- diffusion
model_type: sdar
pipeline_tag: text-generation
library_name: transformers
---

<h1 align="center">DiRL-8B-Instruct</h1>

<p align="center">
  <a href="https://arxiv.org/abs/2512.22234">
    <img src="https://img.shields.io/badge/arXiv-2512.22234-b31b1b.svg" alt="Paper on arXiv"/>
  </a>
  <a href="https://github.com/OpenMOSS/DiRL">
    <img src="https://img.shields.io/badge/GitHub-Code-black.svg?logo=github" alt="GitHub Code"/>
  </a>
</p>

## Introduction

**DiRL-8B-Instruct** is an 8B parameter diffusion language model specialized for mathematical reasoning. It is trained using the [DiRL](https://github.com/OpenMOSS/DiRL) framework based on [SDAR-8B-Chat](https://huggingface.co/JetLM/SDAR-8B-Chat). Through two-stage training (SFT + RL), DiRL-8B-Instruct achieves state-of-the-art results at the 8B scale on mathematical reasoning benchmarks, even outperforming 32B models on most tasks.

> **Highlights**
> 
> * **SOTA Performance:** Achieves **83.05%** on MATH500, **20.63%** on AIME2024, and **20.83%** on AIME2025, surpassing all 8B baselines.
> * **Training Framework:** Trained with [DiRL](https://github.com/OpenMOSS/DiRL), an efficient training framework for diffusion language models.
> * **Strong Baseline:** Built on [SDAR-8B-Chat](https://huggingface.co/JetLM/SDAR-8B-Chat), gaining **+11.20%** on MATH500 and **+11.46%** on AIME2024.

## Inference

### Using LMDeploy

```python
from lmdeploy import pipeline, PytorchEngineConfig, GenerationConfig
from transformers import AutoTokenizer

model_path = "OpenMOSS-Team/DiRL-8B-Instruct"

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path)

# Prepare prompts
prompts = [
    [{"role": "user", "content": "Solve: If x + 5 = 12, what is x?"}],
]
prompts = tokenizer.apply_chat_template(prompts, tokenize=False, add_generation_prompt=True)

# Configure backend for DLLM inference
backend_config = PytorchEngineConfig(
    dtype="float16",
    max_prefill_token_num=8192,
    cache_max_entry_count=0.8,
    dllm_block_length=4,
    dllm_denoising_steps=4,
    dllm_unmasking_strategy="low_confidence_dynamic",
    dllm_confidence_threshold=0.9,
)

# Create inference pipeline
with pipeline(model_path, backend_config=backend_config) as pipe:
    gen_config = GenerationConfig(
        top_p=1.0,
        top_k=50,
        temperature=1.0,
        do_sample=False,  # greedy decoding
        max_new_tokens=8192,
    )
    
    outputs = pipe(prompts, gen_config=gen_config)
    
    for output in outputs:
        print(output.text)
```

## Performance

| Model | MATH500 | GSM8K | AIME2024 | AIME2025 | OlympiadBench | Average |
|-------|---------|-------|----------|----------|---------------|---------|
| Qwen2.5-7B-Instruct | 73.78 | 89.78 | 8.96 | 5.63 | 36.58 | 42.95 |
| Qwen2.5-32B-Instruct | 81.13 | **94.03** | 12.92 | 11.88 | 45.65 | 49.12 |
| SDAR-8B-Chat | 71.85 | 89.87 | 9.17 | 9.38 | 36.03 | 43.26 |
| Trado-8B-Instruct | 75.59 | 91.06 | 11.67 | 15.00 | 40.32 | 46.73 |
| **DiRL-8B-Instruct** | **83.05** | 93.03 | **20.63** | **20.83** | **46.40** | **52.79** |

## Citation

If you use this model in your research, please cite:

```bibtex
@misc{zhu2025dirl,
  title={DiRL: An Efficient Post-Training Framework for Diffusion Language Models},
  author={Zhu, Ying and Wan, Jiaxin and Liu, Xiaoran and He, Siyanag and Wang, Qiqi and Guo, Xu and Liang, Tianyi and Huang, Zengfeng and He, Ziwei and Qiu, Xipeng},
  year={2025},
  eprint={2512.22234},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2512.22234}
}
```