Quant Size Description
Q2_K 1.17 GB Not recommended for most people. Very low quality.
Q2_K_L 1.23 GB Not recommended for most people. Uses Q8_0 for output and embedding, and Q2_K for everything else. Very low quality.
Q2_K_XL 1.46 GB Not recommended for most people. Uses F16 for output and embedding, and Q2_K for everything else. Very low quality.
Q3_K_S 1.33 GB Not recommended for most people. Prefer any bigger Q3_K quantization. Low quality.
Q3_K_M 1.46 GB Not recommended for most people. Low quality.
Q3_K_L 1.57 GB Not recommended for most people. Low quality.
Q3_K_XL 1.63 GB Not recommended for most people. Uses Q8_0 for output and embedding, and Q3_K_L for everything else. Low quality.
Q3_K_XXL 1.86 GB Not recommended for most people. Uses F16 for output and embedding, and Q3_K_L for everything else. Low quality.
Q4_K_S 1.69 GB Recommended. Slightly low quality.
Q4_K_M 1.78 GB Recommended. Decent quality for most use cases.
Q4_K_L 1.84 GB Recommended. Uses Q8_0 for output and embedding, and Q4_K_M for everything else. Decent quality.
Q4_K_XL 2.07 GB Recommended. Uses F16 for output and embedding, and Q4_K_M for everything else. Decent quality.
Q5_K_S 2.01 GB Recommended. High quality.
Q5_K_M 2.06 GB Recommended. High quality.
Q5_K_L 2.12 GB Recommended. Uses Q8_0 for output and embedding, and Q5_K_M for everything else. High quality.
Q5_K_XL 2.35 GB Recommended. Uses F16 for output and embedding, and Q5_K_M for everything else. High quality.
Q6_K 2.36 GB Recommended. Very high quality.
Q6_K_L 2.42 GB Recommended. Uses Q8_0 for output and embedding, and Q6_K for everything else. Very high quality.
Q6_K_XL 2.65 GB Recommended. Uses F16 for output and embedding, and Q6_K for everything else. Very high quality.
Q8_0 3.05 GB Recommended. Quality almost like F16.
F16 5.74 GB Not recommended. Overkill. Prefer Q8_0.
ORIGINAL (BF16) 5.74 GB Not recommended. Overkill. Prefer Q8_0.

Quantized using TAO71-AI AutoQuantizer. You can check out the original model card here.

Downloads last month
18
GGUF
Model size
3B params
Architecture
smollm3
Hardware compatibility
Log In to view the estimation

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Alcoft/HuggingFaceTB_SmolLM3-3B-GGUF

Quantized
(61)
this model

Collections including Alcoft/HuggingFaceTB_SmolLM3-3B-GGUF