Edit Models filters

Model Tree

Apps

Docker Model Runner

Inference Providers

OVHcloud AI Endpoints

HF Inference API

Misc

Inference Endpoints

text-generation-inference

Eval Results (legacy)

text-embeddings-inference

4-bit precision

8-bit precision

Mixture of Experts

Carbon Emissions

Models

102

Base only

Active filters: kv-cache

BananaMind/BananaMind-KV1-8M-2Bit-Experimental

Text Generation • 8.13M • Updated 2 days ago • 202 • 4

tonera/FLUX.2-klein-9b-kv-Nunchaku

Image-to-Image • Updated 13 days ago • 320 • 10

atomicmilkshake/llama-cpp-turboquant-binaries

Updated Apr 8 • 11

fromthesky/PLDR-LLM-v51-104M

Text Generation • 0.1B • Updated Dec 21, 2025 • 3

fromthesky/PLDR-LLM-v51-110M-1

Text Generation • 0.1B • Updated Dec 21, 2025 • 16

fromthesky/PLDR-LLM-v51-110M-2

Text Generation • 0.1B • Updated Dec 21, 2025 • 3

fromthesky/PLDR-LLM-v51-110M-3

Text Generation • 0.1B • Updated Dec 21, 2025 • 9

fromthesky/PLDR-LLM-v51-110M-4

Text Generation • 0.1B • Updated Dec 21, 2025 • 5

fromthesky/PLDR-LLM-v51-110M-5

Text Generation • 0.1B • Updated Dec 21, 2025 • 4

fromthesky/PLDR-LLM-v51-DAG-110M

Text Generation • 0.1B • Updated Dec 21, 2025 • 2

fromthesky/PLDR-LLM-v51G-106M-1

Text Generation • 0.1B • Updated Dec 21, 2025 • 6

fromthesky/PLDR-LLM-v51G-106M-2

Text Generation • 0.1B • Updated Dec 21, 2025 • 5

fromthesky/PLDR-LLM-v51G-106M-3

Text Generation • 0.1B • Updated Dec 21, 2025 • 5

fromthesky/PLDR-LLM-v51G-106M-test

Text Generation • 0.1B • Updated Aug 27, 2025 • 3

fromthesky/PLDR-LLM-v52-81M-FT-SC-1

Text Classification • 81M • Updated Sep 20, 2025 • 4

fromthesky/PLDR-LLM-v52-81M-FT-QA-1

Question Answering • 81M • Updated Sep 20, 2025 • 10

fromthesky/PLDR-LLM-v52-81M-FT-TC-1

Token Classification • 81M • Updated Sep 20, 2025 • 3

fromthesky/PLDR-LLM-v52-110M-1

Text Generation • 0.1B • Updated Dec 21, 2025 • 3

nm-testing/Llama-3.1-8B-Instruct-QKV-Cache-FP8-Per-Tensor

Updated Nov 25, 2025

nm-testing/Llama-3.1-8B-Instruct-QKV-Cache-FP8-Per-Head

Updated Nov 25, 2025

nm-testing/Llama-3.1-8B-Instruct-FP8-dynamic-QKV-Cache-FP8-Per-Tensor

Updated Nov 25, 2025

nm-testing/Llama-3.1-8B-Instruct-FP8-dynamic-QKV-Cache-FP8-Per-Head

Updated Nov 25, 2025

nm-testing/Qwen3-32B-QKV-Cache-FP8-Per-Tensor

Updated Nov 25, 2025

nm-testing/Qwen3-32B-QKV-Cache-FP8-Per-Head

Updated Nov 25, 2025

nm-testing/Qwen3-32B-FP8-dynamic-QKV-Cache-FP8-Per-Tensor

Updated Nov 25, 2025

nm-testing/Qwen3-32B-FP8-dynamic-QKV-Cache-FP8-Per-Head

Updated Nov 25, 2025

nm-testing/Llama-3.3-70B-Instruct-QKV-Cache-FP8-Per-Tensor

Updated Nov 25, 2025

nm-testing/Llama-3.3-70B-Instruct-QKV-Cache-FP8-Per-Head

Updated Nov 25, 2025

nm-testing/Llama-3.3-70B-Instruct-FP8-dynamic-QKV-Cache-FP8-Per-Tensor

Updated Nov 25, 2025

nm-testing/Llama-3.3-70B-Instruct-FP8-dynamic-QKV-Cache-FP8-Per-Head

Updated Nov 25, 2025