HuggingFaceH4/MATH-500
Viewer • Updated • 500 • 162k • 311
How to use ray0rf1re/Nano-nano_v4.5 with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="ray0rf1re/Nano-nano_v4.5") # Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("ray0rf1re/Nano-nano_v4.5")
model = AutoModelForCausalLM.from_pretrained("ray0rf1re/Nano-nano_v4.5")How to use ray0rf1re/Nano-nano_v4.5 with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ray0rf1re/Nano-nano_v4.5"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "ray0rf1re/Nano-nano_v4.5",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker model run hf.co/ray0rf1re/Nano-nano_v4.5
How to use ray0rf1re/Nano-nano_v4.5 with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "ray0rf1re/Nano-nano_v4.5" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "ray0rf1re/Nano-nano_v4.5",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "ray0rf1re/Nano-nano_v4.5" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "ray0rf1re/Nano-nano_v4.5",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'How to use ray0rf1re/Nano-nano_v4.5 with Docker Model Runner:
docker model run hf.co/ray0rf1re/Nano-nano_v4.5
Successor to Nano-nano v4.
Same architecture family, ~8.5% larger, trained from scratch on 15 carefully weighted datasets.
| Architecture | LLaMA (decoder-only) |
| Parameters | ~255.7 M |
| Context length | 2 048 tokens |
| Vocabulary | 50,264 tokens |
| Training loss | 5.1763 |
| Eval score | 16.7% |
| Trained on | 0.08 B tokens |
| Hardware | NVIDIA GTX 1080 8 GB (Pascal) |
| Trained | 2026-05-09 22:50 |
Standard LLaMA decoder-only transformer. Scaled ~8.5% wider + 1 extra layer vs v4.
| Hyperparameter | v4 | v4.5 |
|---|---|---|
| Parameters | ~236 M | ~255.7 M |
hidden_size |
896 | 896 |
intermediate_size |
2 688 | 2 912 |
num_hidden_layers |
14 | 15 |
num_attention_heads |
14 | 14 |
num_key_value_heads |
14 | 14 |
head_dim |
64 | 64 |
vocab_size |
50 264 | 50,264 |
max_position_embeddings |
1 024 | 2 048 |
rms_norm_eps |
1e-6 | 1e-6 |
rope_theta |
10 000 | 10 000 |
hidden_act |
SiLU | SiLU |
tie_word_embeddings |
False | False |
attention_bias |
False | False |
mlp_bias |
False | False |
Automatically evaluated after training across 5 capability dimensions.
| Category | Hits | Score |
|---|---|---|
| Knowledge | 0/5 | 🔴 0% |
| Reasoning | 0/4 | 🔴 0% |
| Hallucination | 0/4 | 🔴 0% |
| Instruction | 2/4 | 🟡 50% |
| Coherence | 1/3 | 🔴 33% |
| Overall | — | 🔴 17% |
Hallucination resistance — whether the model appropriately declines questions about future events, fictional entities, or impossible premises rather than confabulating.
| Setting | Value |
|---|---|
| Hardware | GTX 1080 8 GB · Pascal · CUDA 6.1 |
| Precision | fp32 weights / fp16 AMP (GradScaler) |
| Optimizer | StovetopCooker (HyperNix, pre-Volta) |
| LR | 0.0001 cosine decay |
| Warmup | 6% of steps |
| Embedding freeze | First 15% of steps |
| Effective batch | 8 × 2048 = 16,384 tokens/step |
| Steps | 5092 |
| Total tokens | 0.08 B |
| Grad clipping | 1.0 |
| Grad checkpointing | ✅ |
| Peak VRAM | 5.34 GB |
| HyperNix | ✅ freezer · StovetopCooker · old_fridge · new_fridge · smoke_alarm · pans · smoker |
| Dataset | Samples | Weight | Category |
|---|---|---|---|
Roman1111111/claude-opus-4.6-10000x |
10 k | 2.5× | Claude conversations |
WithinUsAI/GPT5.5_thinking_max_distill_god_seed_25K |
25 k | 2.0× | Reasoning / thinking |
HuggingFaceH4/MATH-500 |
500 | 2.0× | Competition math |
lighteval/MATH-Hard |
10 k | 2.0× | Hard math |
garage-bAInd/Open-Platypus |
25 k | 1.8× | Reasoning instruction |
iamtarun/python_code_instructions_18k_alpaca |
8 k | 1.6× | Python code |
b-mc2/sql-create-context |
6 k | 1.4× | SQL code |
nvidia/OpenCodeInstruct |
30 k | 1.5× | Code instruction |
teknium/OpenHermes-2.5 |
30 k | 1.5× | General instruction |
Amod/mental_health_counseling_conversations |
5 k | 1.2× | Chat / counseling |
ray0rf1re/FineWeb-Nano |
50 k | 1.0× | Web text |
tonytins/chat-dataset |
10 k | 1.0× | Conversation |
databricks/databricks-dolly-15k |
15 k | 1.0× | Instruction following |
mlabonne/guanaco-llama2-1k |
1 k | 1.0× | General QA |
ray0rf1re/hyper-pip |
20 k | 2.0× | HyperNix pip data |
HuggingFaceH4/ultrachat_200k |
30 k | 1.5× | Multi-turn chat |
fka/awesome-chatgpt-prompts |
5 k | 0.8× | Prompt engineering |
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"ray0rf1re/Nano-nano_v4.5",
torch_dtype="auto",
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("ray0rf1re/Nano-nano_v4.5")
def generate(prompt: str, max_new_tokens: int = 256) -> str:
text = f"### Instruction:
{prompt}
### Response:
"
inputs = tokenizer(text, return_tensors="pt").to(model.device)
out = model.generate(
**inputs,
max_new_tokens = max_new_tokens,
do_sample = True,
temperature = 0.7,
top_p = 0.9,
repetition_penalty = 1.1,
pad_token_id = tokenizer.eos_token_id,
)
new_ids = out[0][inputs["input_ids"].shape[-1]:]
return tokenizer.decode(new_ids, skip_special_tokens=True).strip()
# Examples
print(generate("Write a Python function to reverse a linked list."))
print(generate("What is the capital of France?"))
print(generate("Explain gradient descent in simple terms."))
@misc{nano-nano-v45,
author = {ray0rf1re},
title = {Nano-nano v4.5: Compact LLaMA-Family Causal LM},
year = {2026},
publisher = {HuggingFace},
howpublished = {https://huggingface.co/ray0rf1re/Nano-nano_v4.5},
}