WeDLM-8B

WeDLM-8B is a diffusion language model that performs parallel decoding under standard causal attention, initialized from Qwen3-8B.

This is the base (pretrained) version. For the instruction-tuned version, see WeDLM-8B-Instruct.

๐Ÿ“„ Paper (Coming Soon) | ๐ŸŒ Project Page | ๐Ÿ’ป GitHub

Model Details

Attribute Value
Initialized From Qwen3-8B
Parameters 8B
Context Length 32,768

Quick Start (Recommended)

For fast inference, use the wedlm engine:

pip install git+https://github.com/tencent/WeDLM.git
from wedlm import LLM, SamplingParams

llm = LLM(model="tencent/WeDLM-8B")

prompt = "The theory of relativity states that"
outputs = llm.generate([prompt], SamplingParams(max_tokens=256))

print(outputs[0]["text"])

HuggingFace Transformers

For training or simple forward passes, you can load via Transformers:

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("tencent/WeDLM-8B", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    "tencent/WeDLM-8B", 
    trust_remote_code=True,
    torch_dtype="auto",
    device_map="auto"
)

inputs = tokenizer("The theory of relativity", return_tensors="pt").to(model.device)
outputs = model(**inputs)

โš ๏ธ Note: The HuggingFace interface is for training/forward pass convenience. For optimized inference throughput, use the wedlm engine above.

Performance

Benchmark Qwen3-8B WeDLM-8B
ARC-C (0-shot) 92.66 92.92
GSM8K (3-shot) 85.97 90.20
MATH (4-shot) 50.80 53.60
HumanEval (4-shot) 68.90 75.00
MMLU (5-shot) 74.03 75.46
Average 72.61 74.72

Citation (Coming soon)

License

Apache 2.0

Downloads last month
10
Safetensors
Model size
8B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for tencent/WeDLM-8B-Base

Base model

Qwen/Qwen3-8B-Base
Finetuned
Qwen/Qwen3-8B
Finetuned
(720)
this model

Collection including tencent/WeDLM-8B-Base