Instructions to use kedarcv/Clair-3B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use kedarcv/Clair-3B with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="kedarcv/Clair-3B", filename="gguf/clair-v5-Q3_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use kedarcv/Clair-3B with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf kedarcv/Clair-3B:Q4_K_M # Run inference directly in the terminal: llama cli -hf kedarcv/Clair-3B:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf kedarcv/Clair-3B:Q4_K_M # Run inference directly in the terminal: llama cli -hf kedarcv/Clair-3B:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf kedarcv/Clair-3B:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf kedarcv/Clair-3B:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf kedarcv/Clair-3B:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf kedarcv/Clair-3B:Q4_K_M
Use Docker
docker model run hf.co/kedarcv/Clair-3B:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use kedarcv/Clair-3B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "kedarcv/Clair-3B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "kedarcv/Clair-3B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/kedarcv/Clair-3B:Q4_K_M
- Ollama
How to use kedarcv/Clair-3B with Ollama:
ollama run hf.co/kedarcv/Clair-3B:Q4_K_M
- Unsloth Studio
How to use kedarcv/Clair-3B with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for kedarcv/Clair-3B to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for kedarcv/Clair-3B to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for kedarcv/Clair-3B to start chatting
- Atomic Chat new
- Docker Model Runner
How to use kedarcv/Clair-3B with Docker Model Runner:
docker model run hf.co/kedarcv/Clair-3B:Q4_K_M
- Lemonade
How to use kedarcv/Clair-3B with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull kedarcv/Clair-3B:Q4_K_M
Run and chat with the model
lemonade run user.Clair-3B-Q4_K_M
List all available models
lemonade list
Clair v5
Clair is a personalized AI assistant fine-tuned from Qwen2.5-3B-Instruct with embedded identity. It runs efficiently on budget laptops (CPU-only, 8GB RAM) and maintains consistent identity across all interactions.
Model Details
| Property | Value |
|---|---|
| Base Model | Qwen2.5-3B-Instruct |
| Parameters | 3.09B |
| Architecture | Qwen2 (Transformer) |
| Context Length | 4096 tokens |
| Training Method | LoRA (rank 32, alpha 64) |
| Training Epochs | 20 |
| Quantization | Q4_K_M, Q5_K_M, Q3_K_M (GGUF) |
Identity
- Name: Clair
- Creator: Michael Mlungisi Nkomo
- Origin: Zimbabwe
- Role: AI assistant for coding, math, writing, analysis, and general questions
Training
Dataset
- 95 examples with heavy identity emphasis
- 30+ identity questions with variations
- Explicit denials of being ChatGPT, Claude, Qwen
- Greetings, goodbyes, and normal conversations
- Multi-turn dialogues
Training Configuration
| Parameter | Value |
|---|---|
| LoRA Rank | 32 |
| LoRA Alpha | 64 |
| Learning Rate | 1e-4 |
| Batch Size | 4 |
| Gradient Accumulation | 4 |
| Epochs | 20 |
| Quantization | 4-bit (NF4) |
Results
| Metric | Value |
|---|---|
| Training Loss | 0.08047 |
| Token Accuracy | 97.3% |
| Identity Recognition | 100% |
Usage
With Transformers
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("r245142r/Clair-3B")
model = AutoModelForCausalLM.from_pretrained("r245142r/Clair-3B")
messages = [{"role": "user", "content": "Who are you?"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
With Ollama
ollama run r245142r/Clair-3B
With llama.cpp (GGUF)
# Download the quantized model
wget https://huggingface.co/r245142r/Clair-3B/resolve/main/clair-v5-Q4_K_M.gguf
# Run with llama.cpp
./llama-cli -m clair-v5-Q4_K_M.gguf -p "Who are you?" -n 256
Available Files
| File | Size | Description |
|---|---|---|
clair-v5-float16.gguf |
5.75 GB | Full precision GGUF |
clair-v5-Q4_K_M.gguf |
~2.0 GB | 4-bit quantized (recommended) |
clair-v5-Q5_K_M.gguf |
~2.5 GB | 5-bit quantized |
clair-v5-Q3_K_M.gguf |
~1.5 GB | 3-bit quantized |
Hardware Requirements
| Configuration | RAM | Speed |
|---|---|---|
| Q4_K_M (CPU) | ~2.5 GB | ~5-8 tokens/s |
| Q4_K_M (GPU) | ~2.5 GB | ~30-50 tokens/s |
| Float16 (GPU) | ~6 GB | ~40-60 tokens/s |
Benchmarks
Tested on budget laptop (Intel i5, 8GB DDR4, CPU-only):
- RAM Usage: ~6.8 GB total (within 7GB ceiling)
- Model Size: ~2.0 GB (Q4_K_M)
- Context Window: 4096 tokens
- Identity Accuracy: 100%
Development
Built for the ADTC 2026 LaptopLLM Challenge โ running AI on budget hardware.
Key Achievements
- โ Runs on CPU-only laptops with 8GB RAM
- โ Embedded identity (not system prompt)
- โ Natural greetings and goodbyes
- โ 3x faster with Q4_K_M quantization
License
Apache 2.0
Citation
@misc{clair-v5,
author = {Michael Mlungisi Nkomo},
title = {Clair v5: Personalized AI Assistant with Embedded Identity},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/r245142r/Clair-3B}
}
Clair v5 โ Personalized AI with embedded identity, built from Zimbabwe for the world.
- Downloads last month
- 125