Inference Providers
Active filters: kv-cache
atomicmilkshake/llama-cpp-turboquant-binaries
alexcovo/qwen35-9b-mlx-turboquant-tq3
Text Generation
• 2B • Updated • 3.32k
• 3
Updated • 3
• 6
MarkShark2/omnivoice-onnx-kv-b1-fp16
Text-to-Speech
• Updated • 1
fromthesky/PLDR-LLM-v51-104M
Text Generation
• 0.1B • Updated • 11
fromthesky/PLDR-LLM-v51-110M-1
Text Generation
• 0.1B • Updated • 7
fromthesky/PLDR-LLM-v51-110M-2
Text Generation
• 0.1B • Updated • 15
fromthesky/PLDR-LLM-v51-110M-3
Text Generation
• 0.1B • Updated • 10
fromthesky/PLDR-LLM-v51-110M-4
Text Generation
• 0.1B • Updated • 10
fromthesky/PLDR-LLM-v51-110M-5
Text Generation
• 0.1B • Updated • 10
fromthesky/PLDR-LLM-v51-DAG-110M
Text Generation
• 0.1B • Updated • 12
fromthesky/PLDR-LLM-v51G-106M-1
Text Generation
• 0.1B • Updated • 9
fromthesky/PLDR-LLM-v51G-106M-2
Text Generation
• 0.1B • Updated • 8
fromthesky/PLDR-LLM-v51G-106M-3
Text Generation
• 0.1B • Updated • 7
fromthesky/PLDR-LLM-v51G-106M-test
Text Generation
• 0.1B • Updated • 4
fromthesky/PLDR-LLM-v52-81M-FT-SC-1
Text Classification
• 81M • Updated • 8
fromthesky/PLDR-LLM-v52-81M-FT-QA-1
Question Answering
• 81M • Updated • 6
fromthesky/PLDR-LLM-v52-81M-FT-TC-1
Token Classification
• 81M • Updated • 5
fromthesky/PLDR-LLM-v52-110M-1
Text Generation
• 0.1B • Updated • 6
nm-testing/Llama-3.1-8B-Instruct-QKV-Cache-FP8-Per-Tensor
Updated
nm-testing/Llama-3.1-8B-Instruct-QKV-Cache-FP8-Per-Head
Updated
nm-testing/Llama-3.1-8B-Instruct-FP8-dynamic-QKV-Cache-FP8-Per-Tensor
Updated
nm-testing/Llama-3.1-8B-Instruct-FP8-dynamic-QKV-Cache-FP8-Per-Head
Updated
nm-testing/Qwen3-32B-QKV-Cache-FP8-Per-Tensor
Updated
nm-testing/Qwen3-32B-QKV-Cache-FP8-Per-Head
Updated
nm-testing/Qwen3-32B-FP8-dynamic-QKV-Cache-FP8-Per-Tensor
Updated
nm-testing/Qwen3-32B-FP8-dynamic-QKV-Cache-FP8-Per-Head
Updated
nm-testing/Llama-3.3-70B-Instruct-QKV-Cache-FP8-Per-Tensor
Updated
nm-testing/Llama-3.3-70B-Instruct-QKV-Cache-FP8-Per-Head
Updated
nm-testing/Llama-3.3-70B-Instruct-FP8-dynamic-QKV-Cache-FP8-Per-Tensor
Updated