WeDLM-8B-Instruct-GGUF

First GGUF quantization of Tencent WeDLM-8B-Instruct!

Quantized using llama.cpp b7688.

Original model: tencent/WeDLM-8B-Instruct

About

WeDLM is an 8B parameter instruction-tuned model by Tencent, supporting English and Chinese. It features QK Norm architecture similar to Qwen3.

This GGUF uses qwen3 architecture identifier for maximum llama.cpp compatibility.

Available Files

Filename Quant Size Description
WeDLM-8B-Instruct-Q4_K_M.gguf Q4_K_M 4.68 GB Good quality, recommended for most use cases
WeDLM-8B-Instruct-Q8_0.gguf Q8_0 8.11 GB High quality, best accuracy

Performance Benchmarks

CPU (16 threads, Zen4)

Quant Prompt Processing Text Generation
Q4_K_M 88.65 t/s 8.27 t/s
Q8_0 50.80 t/s 5.17 t/s

GPU (RTX 4060 Laptop, 8GB VRAM)

Quant Prompt Processing Text Generation
Q4_K_M 1833.84 t/s 37.08 t/s

Q4_K_M recommended for RTX 4060 (fits in 8GB VRAM)

Prompt Format (ChatML)

<|im_start|>system
You are a helpful AI assistant.<|im_end|>
<|im_start|>user
Hello!<|im_end|>
<|im_start|>assistant

Usage

llama.cpp

./llama-cli -m WeDLM-8B-Instruct-Q4_K_M.gguf \
  -p "<|im_start|>user\nHello<|im_end|>\n<|im_start|>assistant\n" \
  -n 256 -ngl 99

Ollama

# Create Modelfile
cat > Modelfile << 'EOF'
FROM ./WeDLM-8B-Instruct-Q4_K_M.gguf
TEMPLATE "<|im_start|>user\n{{ .Prompt }}<|im_end|>\n<|im_start|>assistant\n"
EOF

ollama create wedlm -f Modelfile
ollama run wedlm

Hardware Requirements

Quant Min VRAM Recommended RAM
Q4_K_M 6 GB 8 GB
Q8_0 10 GB 12 GB

Model Architecture

  • Parameters: 8.19B
  • Layers: 36
  • Hidden Size: 4096
  • Attention Heads: 32 (8 KV heads, GQA)
  • Context Length: 16384
  • Features: QK Norm, SwiGLU, RoPE (theta=1M)

Acknowledgements

Disclaimer

This is an unofficial quantization. For official support, please refer to the original model repository.

Downloads last month
729
GGUF
Model size
8B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

4-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for feedseawave/WeDLM-8B-Instruct-GGUF

Quantized
(4)
this model