WeDLM
Collection
4 items
โข
Updated
โข
4
WeDLM-8B is a diffusion language model that performs parallel decoding under standard causal attention, initialized from Qwen3-8B.
This is the base (pretrained) version. For the instruction-tuned version, see WeDLM-8B-Instruct.
๐ Paper (Coming Soon) | ๐ Project Page | ๐ป GitHub
| Attribute | Value |
|---|---|
| Initialized From | Qwen3-8B |
| Parameters | 8B |
| Context Length | 32,768 |
For fast inference, use the wedlm engine:
pip install git+https://github.com/tencent/WeDLM.git
from wedlm import LLM, SamplingParams
llm = LLM(model="tencent/WeDLM-8B")
prompt = "The theory of relativity states that"
outputs = llm.generate([prompt], SamplingParams(max_tokens=256))
print(outputs[0]["text"])
For training or simple forward passes, you can load via Transformers:
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("tencent/WeDLM-8B", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
"tencent/WeDLM-8B",
trust_remote_code=True,
torch_dtype="auto",
device_map="auto"
)
inputs = tokenizer("The theory of relativity", return_tensors="pt").to(model.device)
outputs = model(**inputs)
โ ๏ธ Note: The HuggingFace interface is for training/forward pass convenience. For optimized inference throughput, use the
wedlmengine above.
| Benchmark | Qwen3-8B | WeDLM-8B |
|---|---|---|
| ARC-C (0-shot) | 92.66 | 92.92 |
| GSM8K (3-shot) | 85.97 | 90.20 |
| MATH (4-shot) | 50.80 | 53.60 |
| HumanEval (4-shot) | 68.90 | 75.00 |
| MMLU (5-shot) | 74.03 | 75.46 |
| Average | 72.61 | 74.72 |
Apache 2.0