Plano-Orchestrator-30B-A3B-GGUF

🧠 Model Overview

Plano-Orchestrator-30B-A3B-GGUF is a quantized version of Plano-Orchestrator-30B-A3B-GGUF, optimized for efficient inference with reduced memory usage and faster runtime while preserving as much of the original model quality as possible.

This repository provides multiple quantized variants suitable for:

  • Local inference
  • Low-VRAM GPUs
  • CPU-only environments

πŸ”— Original Model


πŸ“¦ Quantization Details

  • Quantization method: GGUF
  • Quantization tool: llama.cpp
  • Precision: Mixed (2-8,bit depands in variant)
  • Activation aware: No (weight-only quantinization)
  • Group size: 256 (K-quant variants)

πŸ“¦ Available Quantized Files

Quant Format File Name Approx. Size VRAM / RAM Needed Notes
Q2_K plano-orchestrator-q2_k.gguf ~11.3 GB ~14 GB Extreme compression; noticeable quality loss
Q3_K_S plano-orchestrator-q3_k_s.gguf ~13.3 GB ~15.4 GB Smaller, faster, lower quality
Q3_K_M plano-orchestrator-q3_k_m.gguf ~14.7 GB ~16 GB Better balance than Q3_K_S
Q3_K_L plano-orchestrator-q3_k_l.gguf ~15.9 GB ~18 GB Highest-quality 3-bit variant
Q4_0 plano-orchestrator-q4_0.gguf ~17.3 GB ~19.3 GB Legacy format; simpler quantization
Q4_K_S plano-orchestrator-q4_k_s.gguf ~17.5 GB ~19.5 GB Smaller grouped 4-bit
Q4_K_M plano-orchestrator-q4_k_m.gguf ~ GB ~ GB Recommended default
Q5_0 plano-orchestratorq-5_0.gguf ~21 GB ~23 GB Higher quality, larger size
Q5_K_S plano-orchestrator-q5_k_s.gguf ~21.1 GB ~23 GB Efficient high-quality variant
Q5_K_M plano-orchestratorq-5_K_M.gguf ~X GB ~ GB Near-FP16 quality
Q6_K plano-orchestratorq-6_k.gguf ~25.1 GB ~27 GB Minimal quantization loss
Q8_0 plano-orchestrator-q8_0.gguf ~32.5 GB ~36 GB Maximum quality; large memory

πŸ’‘ Recommendation: Start with Q4_K_M for the best quality-to-performance ratio.


πŸš€ Usage Example

llama.cpp

./main -m plano-orchestrator-30b-a3b-q6_k.gguf -p "Your prompt here" -n 256

Python (llama-cpp-python)

from llama_cpp import Llama

llm = Llama(
    model_path="plano-orchestrator-30b-a3b-q6_k.gguf",
    n_ctx=4096,
    n_threads=8
)

print(llm("Your prompt here"))

πŸ™‹ Contact

Maintainer: M Mashhudur Rahim [XythicK]

Role:
Independent Machine Learning Researcher & Model Infrastructure Maintainer

(Focused on model quantization, optimization, and efficient deployment)

For issues, improvement requests, or additional quantization formats, please use the Hugging Face Discussions or Issues tab.

❀️ Acknowledgements

Thanks to the original model authors for their ongoing contributions to open AI research, and to Hugging Face and the open-source machine learning community for providing the tools and platforms that make efficient model sharing and deployment possible.

Downloads last month
147
GGUF
Model size
31B params
Architecture
qwen3moe
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for XythicK/Plano-Orchestrator-30B-A3B-GGUF

Quantized
(4)
this model