DiRL-8B-Instruct / README.md

Add pipeline_tag and library_name to metadata (#1)

89225da verified 9 days ago

3.66 kB

	---
	base_model: JetLM/SDAR-8B-Chat
	language:
	- en
	- zh
	license: apache-2.0
	tags:
	- math
	- reasoning
	- diffusion
	model_type: sdar
	pipeline_tag: text-generation
	library_name: transformers
	---

	<h1 align="center">DiRL-8B-Instruct</h1>

	<p align="center">
	<a href="https://arxiv.org/abs/2512.22234">
	<img src="https://img.shields.io/badge/arXiv-2512.22234-b31b1b.svg" alt="Paper on arXiv"/>
	</a>
	<a href="https://github.com/OpenMOSS/DiRL">
	<img src="https://img.shields.io/badge/GitHub-Code-black.svg?logo=github" alt="GitHub Code"/>
	</a>
	</p>

	## Introduction

	DiRL-8B-Instruct is an 8B parameter diffusion language model specialized for mathematical reasoning. It is trained using the [DiRL](https://github.com/OpenMOSS/DiRL) framework based on [SDAR-8B-Chat](https://huggingface.co/JetLM/SDAR-8B-Chat). Through two-stage training (SFT + RL), DiRL-8B-Instruct achieves state-of-the-art results at the 8B scale on mathematical reasoning benchmarks, even outperforming 32B models on most tasks.

	> Highlights
	>
	> * SOTA Performance: Achieves 83.05% on MATH500, 20.63% on AIME2024, and 20.83% on AIME2025, surpassing all 8B baselines.
	> * Training Framework: Trained with [DiRL](https://github.com/OpenMOSS/DiRL), an efficient training framework for diffusion language models.
	> * Strong Baseline: Built on [SDAR-8B-Chat](https://huggingface.co/JetLM/SDAR-8B-Chat), gaining +11.20% on MATH500 and +11.46% on AIME2024.

	## Inference

	### Using LMDeploy

	```python
	from lmdeploy import pipeline, PytorchEngineConfig, GenerationConfig
	from transformers import AutoTokenizer

	model_path = "OpenMOSS-Team/DiRL-8B-Instruct"

	# Load tokenizer
	tokenizer = AutoTokenizer.from_pretrained(model_path)

	# Prepare prompts
	prompts = [
	[{"role": "user", "content": "Solve: If x + 5 = 12, what is x?"}],
	]
	prompts = tokenizer.apply_chat_template(prompts, tokenize=False, add_generation_prompt=True)

	# Configure backend for DLLM inference
	backend_config = PytorchEngineConfig(
	dtype="float16",
	max_prefill_token_num=8192,
	cache_max_entry_count=0.8,
	dllm_block_length=4,
	dllm_denoising_steps=4,
	dllm_unmasking_strategy="low_confidence_dynamic",
	dllm_confidence_threshold=0.9,
	)

	# Create inference pipeline
	with pipeline(model_path, backend_config=backend_config) as pipe:
	gen_config = GenerationConfig(
	top_p=1.0,
	top_k=50,
	temperature=1.0,
	do_sample=False, # greedy decoding
	max_new_tokens=8192,
	)

	outputs = pipe(prompts, gen_config=gen_config)

	for output in outputs:
	print(output.text)
	```

	## Performance

	\| Model \| MATH500 \| GSM8K \| AIME2024 \| AIME2025 \| OlympiadBench \| Average \|
	\|-------\|---------\|-------\|----------\|----------\|---------------\|---------\|
	\| Qwen2.5-7B-Instruct \| 73.78 \| 89.78 \| 8.96 \| 5.63 \| 36.58 \| 42.95 \|
	\| Qwen2.5-32B-Instruct \| 81.13 \| 94.03 \| 12.92 \| 11.88 \| 45.65 \| 49.12 \|
	\| SDAR-8B-Chat \| 71.85 \| 89.87 \| 9.17 \| 9.38 \| 36.03 \| 43.26 \|
	\| Trado-8B-Instruct \| 75.59 \| 91.06 \| 11.67 \| 15.00 \| 40.32 \| 46.73 \|
	\| DiRL-8B-Instruct \| 83.05 \| 93.03 \| 20.63 \| 20.83 \| 46.40 \| 52.79 \|

	## Citation

	If you use this model in your research, please cite:

	```bibtex
	@misc{zhu2025dirl,
	title={DiRL: An Efficient Post-Training Framework for Diffusion Language Models},
	author={Zhu, Ying and Wan, Jiaxin and Liu, Xiaoran and He, Siyanag and Wang, Qiqi and Guo, Xu and Liang, Tianyi and Huang, Zengfeng and He, Ziwei and Qiu, Xipeng},
	year={2025},
	eprint={2512.22234},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2512.22234}
	}
	```