WeDLM-8B-Instruct-GGUF
First GGUF quantization of Tencent WeDLM-8B-Instruct!
Quantized using llama.cpp b7688.
Original model: tencent/WeDLM-8B-Instruct
About
WeDLM is an 8B parameter instruction-tuned model by Tencent, supporting English and Chinese. It features QK Norm architecture similar to Qwen3.
This GGUF uses qwen3 architecture identifier for maximum llama.cpp compatibility.
Available Files
| Filename | Quant | Size | Description |
|---|---|---|---|
| WeDLM-8B-Instruct-Q4_K_M.gguf | Q4_K_M | 4.68 GB | Good quality, recommended for most use cases |
| WeDLM-8B-Instruct-Q8_0.gguf | Q8_0 | 8.11 GB | High quality, best accuracy |
Performance Benchmarks
CPU (16 threads, Zen4)
| Quant | Prompt Processing | Text Generation |
|---|---|---|
| Q4_K_M | 88.65 t/s | 8.27 t/s |
| Q8_0 | 50.80 t/s | 5.17 t/s |
GPU (RTX 4060 Laptop, 8GB VRAM)
| Quant | Prompt Processing | Text Generation |
|---|---|---|
| Q4_K_M | 1833.84 t/s | 37.08 t/s |
Q4_K_M recommended for RTX 4060 (fits in 8GB VRAM)
Prompt Format (ChatML)
<|im_start|>system
You are a helpful AI assistant.<|im_end|>
<|im_start|>user
Hello!<|im_end|>
<|im_start|>assistant
Usage
llama.cpp
./llama-cli -m WeDLM-8B-Instruct-Q4_K_M.gguf \
-p "<|im_start|>user\nHello<|im_end|>\n<|im_start|>assistant\n" \
-n 256 -ngl 99
Ollama
# Create Modelfile
cat > Modelfile << 'EOF'
FROM ./WeDLM-8B-Instruct-Q4_K_M.gguf
TEMPLATE "<|im_start|>user\n{{ .Prompt }}<|im_end|>\n<|im_start|>assistant\n"
EOF
ollama create wedlm -f Modelfile
ollama run wedlm
Hardware Requirements
| Quant | Min VRAM | Recommended RAM |
|---|---|---|
| Q4_K_M | 6 GB | 8 GB |
| Q8_0 | 10 GB | 12 GB |
Model Architecture
- Parameters: 8.19B
- Layers: 36
- Hidden Size: 4096
- Attention Heads: 32 (8 KV heads, GQA)
- Context Length: 16384
- Features: QK Norm, SwiGLU, RoPE (theta=1M)
Acknowledgements
- Original model: Tencent WeDLM Team
- Inference framework: llama.cpp
Disclaimer
This is an unofficial quantization. For official support, please refer to the original model repository.
- Downloads last month
- 729
Hardware compatibility
Log In
to add your hardware
4-bit
8-bit
Model tree for feedseawave/WeDLM-8B-Instruct-GGUF
Base model
tencent/WeDLM-8B-Instruct