"Not all quantized model perform good", serving framework ollama uses NVIDIA gpu, llama.cpp uses CPU with AVX & AMX
v1k
xbruce22
AI & ML interests
None yet
Recent Activity
liked a model 1 day ago
netflix/void-model liked a model 3 days ago
google/gemma-4-E4B-it liked a model 6 days ago
Qwen/Qwen2.5-VL-3B-Instruct