|
|
--- |
|
|
base_model: JetLM/SDAR-8B-Chat |
|
|
language: |
|
|
- en |
|
|
- zh |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- math |
|
|
- reasoning |
|
|
- diffusion |
|
|
model_type: sdar |
|
|
pipeline_tag: text-generation |
|
|
library_name: transformers |
|
|
--- |
|
|
|
|
|
<h1 align="center">DiRL-8B-Instruct</h1> |
|
|
|
|
|
<p align="center"> |
|
|
<a href="https://arxiv.org/abs/2512.22234"> |
|
|
<img src="https://img.shields.io/badge/arXiv-2512.22234-b31b1b.svg" alt="Paper on arXiv"/> |
|
|
</a> |
|
|
<a href="https://github.com/OpenMOSS/DiRL"> |
|
|
<img src="https://img.shields.io/badge/GitHub-Code-black.svg?logo=github" alt="GitHub Code"/> |
|
|
</a> |
|
|
</p> |
|
|
|
|
|
## Introduction |
|
|
|
|
|
**DiRL-8B-Instruct** is an 8B parameter diffusion language model specialized for mathematical reasoning. It is trained using the [DiRL](https://github.com/OpenMOSS/DiRL) framework based on [SDAR-8B-Chat](https://huggingface.co/JetLM/SDAR-8B-Chat). Through two-stage training (SFT + RL), DiRL-8B-Instruct achieves state-of-the-art results at the 8B scale on mathematical reasoning benchmarks, even outperforming 32B models on most tasks. |
|
|
|
|
|
> **Highlights** |
|
|
> |
|
|
> * **SOTA Performance:** Achieves **83.05%** on MATH500, **20.63%** on AIME2024, and **20.83%** on AIME2025, surpassing all 8B baselines. |
|
|
> * **Training Framework:** Trained with [DiRL](https://github.com/OpenMOSS/DiRL), an efficient training framework for diffusion language models. |
|
|
> * **Strong Baseline:** Built on [SDAR-8B-Chat](https://huggingface.co/JetLM/SDAR-8B-Chat), gaining **+11.20%** on MATH500 and **+11.46%** on AIME2024. |
|
|
|
|
|
## Inference |
|
|
|
|
|
### Using LMDeploy |
|
|
|
|
|
```python |
|
|
from lmdeploy import pipeline, PytorchEngineConfig, GenerationConfig |
|
|
from transformers import AutoTokenizer |
|
|
|
|
|
model_path = "OpenMOSS-Team/DiRL-8B-Instruct" |
|
|
|
|
|
# Load tokenizer |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_path) |
|
|
|
|
|
# Prepare prompts |
|
|
prompts = [ |
|
|
[{"role": "user", "content": "Solve: If x + 5 = 12, what is x?"}], |
|
|
] |
|
|
prompts = tokenizer.apply_chat_template(prompts, tokenize=False, add_generation_prompt=True) |
|
|
|
|
|
# Configure backend for DLLM inference |
|
|
backend_config = PytorchEngineConfig( |
|
|
dtype="float16", |
|
|
max_prefill_token_num=8192, |
|
|
cache_max_entry_count=0.8, |
|
|
dllm_block_length=4, |
|
|
dllm_denoising_steps=4, |
|
|
dllm_unmasking_strategy="low_confidence_dynamic", |
|
|
dllm_confidence_threshold=0.9, |
|
|
) |
|
|
|
|
|
# Create inference pipeline |
|
|
with pipeline(model_path, backend_config=backend_config) as pipe: |
|
|
gen_config = GenerationConfig( |
|
|
top_p=1.0, |
|
|
top_k=50, |
|
|
temperature=1.0, |
|
|
do_sample=False, # greedy decoding |
|
|
max_new_tokens=8192, |
|
|
) |
|
|
|
|
|
outputs = pipe(prompts, gen_config=gen_config) |
|
|
|
|
|
for output in outputs: |
|
|
print(output.text) |
|
|
``` |
|
|
|
|
|
## Performance |
|
|
|
|
|
| Model | MATH500 | GSM8K | AIME2024 | AIME2025 | OlympiadBench | Average | |
|
|
|-------|---------|-------|----------|----------|---------------|---------| |
|
|
| Qwen2.5-7B-Instruct | 73.78 | 89.78 | 8.96 | 5.63 | 36.58 | 42.95 | |
|
|
| Qwen2.5-32B-Instruct | 81.13 | **94.03** | 12.92 | 11.88 | 45.65 | 49.12 | |
|
|
| SDAR-8B-Chat | 71.85 | 89.87 | 9.17 | 9.38 | 36.03 | 43.26 | |
|
|
| Trado-8B-Instruct | 75.59 | 91.06 | 11.67 | 15.00 | 40.32 | 46.73 | |
|
|
| **DiRL-8B-Instruct** | **83.05** | 93.03 | **20.63** | **20.83** | **46.40** | **52.79** | |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model in your research, please cite: |
|
|
|
|
|
```bibtex |
|
|
@misc{zhu2025dirl, |
|
|
title={DiRL: An Efficient Post-Training Framework for Diffusion Language Models}, |
|
|
author={Zhu, Ying and Wan, Jiaxin and Liu, Xiaoran and He, Siyanag and Wang, Qiqi and Guo, Xu and Liang, Tianyi and Huang, Zengfeng and He, Ziwei and Qiu, Xipeng}, |
|
|
year={2025}, |
|
|
eprint={2512.22234}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.CL}, |
|
|
url={https://arxiv.org/abs/2512.22234} |
|
|
} |
|
|
``` |