WeDLM-7B-Instruct

WeDLM-7B-Instruct is an instruction-tuned diffusion language model that performs parallel decoding under standard causal attention, fine-tuned from WeDLM-7B.

For the base (pretrained) version, see WeDLM-7B.

๐Ÿ“„ Paper (Coming Soon) | ๐ŸŒ Project Page | ๐Ÿ’ป GitHub

Model Details

Attribute Value
Base Model WeDLM-7B
Parameters 7B
Context Length 32,768

Quick Start (Recommended)

For fast inference, use the wedlm engine:

pip install git+https://github.com/tencent/WeDLM.git
from transformers import AutoTokenizer
from wedlm import LLM, SamplingParams

llm = LLM(model="tencent/WeDLM-7B-Instruct")
tokenizer = AutoTokenizer.from_pretrained("tencent/WeDLM-7B-Instruct", trust_remote_code=True)

prompt = "Explain the difference between machine learning and deep learning."
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

outputs = llm.generate([text], SamplingParams(temperature=0.2, max_tokens=512))
print(outputs[0]["text"])

Multi-turn Conversation

messages = [
    {"role": "user", "content": "What is Python?"},
    {"role": "assistant", "content": "Python is a high-level programming language known for its simplicity and readability."},
    {"role": "user", "content": "Show me a hello world example."}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = llm.generate([text], SamplingParams(temperature=0.2, max_tokens=256))

HuggingFace Transformers

For training or simple forward passes:

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("tencent/WeDLM-7B-Instruct", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    "tencent/WeDLM-7B-Instruct", 
    trust_remote_code=True,
    torch_dtype="auto",
    device_map="auto"
)

messages = [{"role": "user", "content": "Hello!"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model(**inputs)

โš ๏ธ Note: The HuggingFace interface is for training/forward pass convenience. For optimized inference throughput, use the wedlm engine above.

Performance

Benchmark Qwen2.5-7B-Instruct WeDLM-7B-Instruct
ARC-C (0-shot) 86.09 89.59
GSM8K (3-shot) 89.91 87.57
MATH (4-shot) 45.00 55.40
HumanEval (4-shot) 76.22 75.00
MMLU (5-shot) 71.98 70.52

Citation (Coming soon)

License

Apache 2.0

Downloads last month
35
Safetensors
Model size
8B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Collection including tencent/WeDLM-7B-Instruct