Edit Models filters

Apps

Docker Model Runner

Inference Providers

OVHcloud AI Endpoints

HF Inference API

Misc

Inference Endpoints

text-generation-inference

4-bit precision

8-bit precision

text-embeddings-inference

Mixture of Experts

Carbon Emissions

Models

167

Full-text search

Active filters: int4

RedHatAI/Llama-3.3-70B-Instruct-quantized.w4a16

Text Generation • 11B • Updated Sep 22, 2025 • 3.55k • 3

RedHatAI/Mixtral-8x22B-v0.1-quantized.w4a16

18B • Updated Jan 3, 2025 • 3

RedHatAI/Mixtral-8x7B-v0.1-quantized.w4a16

6B • Updated Mar 1, 2025 • 3

RedHatAI/QwQ-32B-Preview-quantized.w4a16

6B • Updated Jan 3, 2025 • 3

RedHatAI/Llama-3.1-Nemotron-70B-Instruct-HF-quantized.w4a16

Text Generation • 11B • Updated Jan 3, 2025 • 4

nintwentydo/pixtral-12b-2409-W4A16-G128

Image-Text-to-Text • 3B • Updated Jan 5, 2025 • 11 • 2

RedHatAI/granite-3.1-8b-instruct-quantized.w4a16

Text Generation • 1B • Updated Sep 22, 2025 • 403 • 1

RedHatAI/granite-3.1-2b-instruct-quantized.w4a16

Text Generation • 0.5B • Updated Feb 28, 2025 • 60

RedHatAI/DeepSeek-V2.5-1210-quantized.w4a16

Text Generation • 32B • Updated Jan 11, 2025 • 5

RedHatAI/DeepSeek-Coder-V2-Instruct-0724-quantized.w4a16

Text Generation • 32B • Updated Jan 12, 2025 • 12 • 1

RedHatAI/granite-3.1-2b-base-quantized.w4a16

Text Generation • 0.5B • Updated Feb 28, 2025 • 8

RedHatAI/granite-3.1-8b-base-quantized.w4a16

Text Generation • 1B • Updated Sep 22, 2025 • 37 • 1

ModelCloud/Qwen2.5-0.5B-Instruct-gptqmodel-w4a16

Text Generation • 0.5B • Updated Oct 19, 2025 • 8 • 1

inarikami/DeepSeek-R1-Distill-Qwen-32B-AWQ

Text Generation • 33B • Updated Jan 23, 2025 • 546 • 10

ModelCloud/DeepSeek-R1-Distill-Qwen-7B-gptqmodel-4bit-vortex-v1

Text Generation • 8B • Updated Jan 24, 2025 • 19 • 5

ModelCloud/DeepSeek-R1-Distill-Qwen-7B-gptqmodel-4bit-vortex-v2

Text Generation • 8B • Updated Jan 24, 2025 • 226 • 7

RedHatAI/Mistral-Small-24B-Instruct-2501-quantized.w4a16

Text Generation • 4B • Updated Oct 29, 2025 • 296 • 1

RedHatAI/Phi-3-vision-128k-instruct-W4A16-G128

Text Generation • 1B • Updated Feb 10, 2025 • 8 • 1

RedHatAI/whisper-large-v2-W4A16-G128

Automatic Speech Recognition • 0.3B • Updated Jan 31, 2025 • 7 • 1

RedHatAI/DeepSeek-R1-Distill-Llama-8B-quantized.w4a16

Text Generation • 2B • Updated Feb 27, 2025 • 456

RedHatAI/DeepSeek-R1-Distill-Qwen-7B-quantized.w4a16

Text Generation • 2B • Updated Feb 27, 2025 • 457 • 2

RedHatAI/DeepSeek-R1-Distill-Qwen-14B-quantized.w4a16

Text Generation • 3B • Updated Feb 27, 2025 • 269 • 1

RedHatAI/DeepSeek-R1-Distill-Qwen-32B-quantized.w4a16

Text Generation • 6B • Updated Feb 27, 2025 • 479 • 5

RedHatAI/DeepSeek-R1-Distill-Llama-70B-quantized.w4a16

Text Generation • 11B • Updated Feb 27, 2025 • 401 • 5

RedHatAI/DeepSeek-R1-Distill-Qwen-1.5B-quantized.w4a16

Text Generation • 0.6B • Updated Feb 27, 2025 • 8 • 1

RedHatAI/Pixtral-Large-Instruct-2411-hf-quantized.w4a16

Image-Text-to-Text • 19B • Updated Mar 31, 2025 • 428

RedHatAI/pixtral-12b-quantized.w4a16

Image-to-Text • 3B • Updated Feb 26, 2025 • 40 • 1

nm-testing/whisper-large-v3.w4a16

Automatic Speech Recognition • 0.3B • Updated Feb 14, 2025 • 5 • 2

context-labs/neuralmagic-llama-3.1-8b-instruct-Q4KM

Text Generation • 8B • Updated Feb 23, 2025 • 3

ModelCloud/QwQ-32B-gptqmodel-4bit-vortex-v1

Text Generation • 33B • Updated Mar 9, 2025 • 29 • 11