Buckets:
Qwable-9B-Claude-Fable-5-GGUF
Developed by Empero
GGUF quantizations of empero-ai/Qwable-9B-Claude-Fable-5
for llama.cpp, Ollama, LM Studio, and other GGUF runtimes. This repo
ships a vision projector (mmproj), so the model runs as a full multimodal (image + text) assistant —
not just text.
Qwable-9B-Claude-Fable-5 is a full-parameter fine-tune of Qwen3.5-9B on agentic coding and reasoning traces distilled from Claude Fable 5 and a GPT-5.5 terminal agent. For full training details and the complete evaluation, see the base model card.
Early release. Strong coding and agentic behavior out of the box; a full benchmark suite is underway and will be published. See Provenance & licensing.
Files
Text weights — pick one quant
| File | Quant | Size | Notes |
|---|---|---|---|
Qwable-9B-Claude-Fable-5-Q4_K_M.gguf |
Q4_K_M | 5.3 GB | recommended default — smallest, runs on ~6–8 GB VRAM |
Qwable-9B-Claude-Fable-5-Q5_K_M.gguf |
Q5_K_M | 6.1 GB | balanced quality / size |
Qwable-9B-Claude-Fable-5-Q6_K.gguf |
Q6_K | 6.9 GB | high quality |
Qwable-9B-Claude-Fable-5-Q8_0.gguf |
Q8_0 | 8.9 GB | near-lossless |
Qwable-9B-Claude-Fable-5-bf16.gguf |
BF16 | 17 GB | full precision (conversion base) |
Vision projector — for image input
| File | Size | Notes |
|---|---|---|
mmproj-Qwable-9B-Claude-Fable-5-f16.gguf |
876 MB | CLIP vision encoder; required for images, pairs with any quant above |
Text-only use needs just a quant. For image understanding, download both a text quant and the mmproj.
Usage
llama.cpp — text
llama-cli -m Qwable-9B-Claude-Fable-5-Q4_K_M.gguf --jinja \
-p "Write a Python function that merges two sorted lists." \
--temp 0.6 --top-p 0.95 --top-k 20 --repeat-penalty 1.05 -n 2048
llama.cpp — multimodal (image + text)
llama-mtmd-cli -m Qwable-9B-Claude-Fable-5-Q4_K_M.gguf \
--mmproj mmproj-Qwable-9B-Claude-Fable-5-f16.gguf \
--image photo.jpg -p "Describe this image." \
--temp 0.6 --top-p 0.95 --top-k 20 -n 512
Ollama
ollama run hf.co/empero-ai/Qwable-9B-Claude-Fable-5-GGUF:Q4_K_M
Or via a Modelfile (pulls in the vision projector for image support):
FROM ./Qwable-9B-Claude-Fable-5-Q4_K_M.gguf
FROM ./mmproj-Qwable-9B-Claude-Fable-5-f16.gguf
PARAMETER temperature 0.6
PARAMETER top_p 0.95
PARAMETER top_k 20
PARAMETER repeat_penalty 1.05
Sampling & output format
- Sampling (Qwen3.5 recommended): general tasks
temp 1.0, precise codingtemp 0.6;top_p 0.95, top_k 20, min_p 0. Userepeat_penalty 1.05(a small bump from Qwen's default 1.0) to avoid rare non-terminating reasoning loops, and allow generous-n/max_new_tokens. - Reasoning model: every response opens with a
<think>...</think>block before the final answer — parse and strip that span for end users.
Model details
- Developed by: Empero
- Base model: Qwen3.5-9B — a dense, natively multimodal model with a hybrid attention stack (3:1 Gated DeltaNet linear-attention to Gated full-attention), ~152k vocabulary, long native context.
- Fine-tune type: full parameter (all text-backbone weights trained), assistant-only loss. The vision
tower was left unchanged from the base — so vision works (via the included
mmproj) but was inherited, not specifically tuned. - Format: GGUF (text quants + CLIP
mmproj), converted and quantized with llama.cpp. - Languages: primarily English.
Evaluation
The evaluation below was measured on the unquantized fine-tune. Quantized variants are very close at Q8_0/Q6_K and degrade gradually toward Q4_K_M — expect a small quality drop at the lower quants.
Training quality was tracked via held-out validation loss / token-accuracy on a 100-example split (80% Fable / 20% terminal), plus a qualitative generation review:
| Step | eval loss | eval token-acc |
|---|---|---|
| 100 | 0.743 | 0.784 |
| 300 (≈ epoch 1) | 0.714 | 0.791 |
| 500 | 0.713 | 0.791 |
No overfitting: held-out loss decreased then plateaued (~0.71) through epoch 2 — it never rose even as
train loss fell to ~0.64. In a 34-prompt qualitative review, roughly 27/34 responses were clean and
correct, strongest on coding and terminal/agentic tasks — current tooling (ss over netstat,
git-filter-repo, Argon2id) with security-aware judgment (rotating a leaked key first, constant-time
comparison). Full transcripts: sample_generations.md.
Limitations
- Reasoning model. Each response opens with a
<think>block; strip it for end users and allow generous output length. Userepeat_penalty≈1.05for consistently crisp completions. - Strongest within its domain (coding / agentic / reasoning). For general-knowledge or long-form factual questions, verify specifics as with any 9B model.
- Reflects its base and teachers. A distillation fine-tune of Qwen3.5-9B on Claude Fable 5 and GPT-5.5 traces; it carries their style and limits and received no extra safety tuning. Add your own review/safety layer for production.
- Quantization. Lower quants (esp. Q4_K_M) trade a little accuracy for size; use Q6_K/Q8_0 when quality matters most.
Quantization
Converted from the fine-tuned weights with llama.cpp convert_hf_to_gguf.py, then quantized with
llama-quantize. The BF16 GGUF is the conversion base; the K-quants are derived from it. The mmproj is the
base Qwen3.5-VL vision encoder (unchanged by fine-tuning). All files were verified to load and generate
in llama.cpp — text (code, reasoning) and image understanding both confirmed.
Provenance & licensing
Weights are released under Apache-2.0, inherited from the Qwen3.5-9B base. The fine-tuning data comes from generated traces of Claude Fable 5 and GPT-5.5 (via the linked public datasets). Because those traces originate from third-party assistants, the providers' terms may apply to downstream training and distillation — if you plan to build on this model commercially, confirm your use aligns with those terms. Shared with the community for research and experimentation, as-is.
Support / Donate
If this model helped you, consider supporting the project:
- BTC:
bc1qx6zepu6sfkvshgdmc4ewu6pk6rpadvpgffpp7v - LTC:
ltc1qv2mefzps2vtjcpwfx8xxdrpplrcvltswm68r7x - XMR:
42Dbm5xg5Nq26fdyzfEU7KBnAJfhi7Cvz5J2ex5CzHXkfKuNEJzYCcmJ1GTbgjFZ5MBx72sdG1G9239Cd6rsZfv4QeDkYJY
Acknowledgements
- Developed and released by Empero
- Base model: Qwen3.5-9B (Alibaba Qwen team)
- Datasets:
Glint-Research/Fable-5-traces,Roman1111111/gpt5.5-terminal - Tooling: llama.cpp, TRL, Transformers
- Total size
- 4.38 TB
- Files
- 6,775
- Last updated
- Jun 24
- Pre-warmed CDN
- US EU US EU