Qwen3 Technical Report
Paper
•
2505.09388
•
Published
•
333
This model is a layer-pruned version of Qwen3-8B-Base using the LaCo (Layer Collapse) structured pruning method.
| Attribute | Value |
|---|---|
| Base Model | Qwen/Qwen3-8B-Base |
| Pruning Method | LaCo (Layer Collapse) |
| Original Layers | 36 |
| Pruned Layers | 26 |
| Layers Removed | 10 |
| Compression | 27.8% |
| Parameters | ~5.8B (reduced from ~8B) |
This pruned model has significantly reduced factual knowledge capabilities. It performs at near-random levels on knowledge-intensive benchmarks like MMLU.
| Use Case | Status |
|---|---|
| Physical reasoning tasks | ✅ Good (82.6% retained) |
| Reading comprehension | ⚠️ Acceptable (74.3% retained) |
| Common sense reasoning | ⚠️ Degraded (61.8% retained) |
| Factual question answering | ❌ Not recommended |
| Knowledge-intensive tasks | ❌ Not recommended |
Recommendation: Fine-tune this model on your target domain before deployment.
| Parameter | Value | Description |
|---|---|---|
| MERGE_LAYERS (C) | 3 | Layers merged per operation |
| LOWEST_LAY (L) | 4 | Minimum layer index for merging |
| HIGHEST_LAY (H) | 28 | Maximum layer index for merging |
| INTERVAL (I) | 2 | Minimum gap between merge points |
| THRESHOLD (T) | 0.85 | Cosine similarity threshold |
| MAX_COMPRESSION | 30% | Maximum allowed compression |
| Metric | Value |
|---|---|
| Successful Merges | 5 |
| Rejected Merges | 0 |
| Total Iterations | 6 |
| Final Compression | 27.8% |
| Metric | Value |
|---|---|
| Average | 0.9680 |
| Min | 0.9492 |
| Max | 0.9766 |
Individual similarities: [0.9492, 0.9727, 0.9609, 0.9766, 0.9688, 0.9648, 0.9648, 0.9766, 0.9727, 0.9727]
| Model | Perplexity | Ratio |
|---|---|---|
| Original (Qwen3-8B-Base) | 26.19 | 1.00× |
| Pruned (this model) | 71.48 | 2.73× |
| Benchmark | Original | Pruned | Retention | Status |
|---|---|---|---|---|
| PIQA | 79.54% | 65.67% | 82.6% | ✅ Good |
| BoolQ | 83.09% | 61.77% | 74.3% | ⚠️ Acceptable |
| HellaSwag | 78.55% | 48.52% | 61.8% | ⚠️ Degraded |
| MMLU (5-shot) | 76.89% | 25.12% | 32.7% | ❌ Near random |
Original scores from Qwen3 Technical Report
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "Mercity/Qwen3-8B-LaCo-Pruned"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto",
trust_remote_code=True
)
# Text generation
prompt = "The process of photosynthesis"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100, do_sample=True, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype="float16",
bnb_4bit_quant_type="nf4",
)
model = AutoModelForCausalLM.from_pretrained(
"Mercity/Qwen3-8B-LaCo-Pruned",
quantization_config=quantization_config,
device_map="auto",
trust_remote_code=True
)
To restore performance after pruning:
from peft import LoraConfig, get_peft_model
lora_config = LoraConfig(
r=32,
lora_alpha=64,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj"],
lora_dropout=0.05,
)
model = get_peft_model(model, lora_config)
# Fine-tune on OpenOrca, Alpaca, or domain-specific data
Use original Qwen3-8B-Base as teacher to transfer knowledge back.
| Attribute | Value |
|---|---|
| Architecture | Transformer decoder-only |
| Parameters | ~5.8B |
| Layers | 26 |
| Hidden Size | 4096 |
| Attention Heads (Q) | 32 |
| Attention Heads (KV) | 8 (GQA) |
| Intermediate Size | 12288 |
| Vocabulary Size | 151,669 |
| Max Context Length | 32,768 tokens |
| Precision | bfloat16 |
If you use this model, please cite the original LaCo paper and Qwen3:
@article{yang2024laco,
title={LaCo: Large Language Model Pruning via Layer Collapse},
author={Yang, Yifei and Cao, Zouying and Zhao, Hai},
journal={arXiv preprint arXiv:2402.11187},
year={2024}
}
@misc{qwen3technicalreport,
title={Qwen3 Technical Report},
author={Qwen Team},
year={2025},
eprint={2505.09388},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.09388}
}
Apache 2.0 (same as base Qwen3 model)
Base model
Qwen/Qwen3-8B-Base