nano
Model repository: Kanonenbombe/nano
Sparse MoE coding model checkpoint exported from the local training_moe20 pipeline.
Open Weights Only
This repository intentionally contains only open model artifacts for inference/fine-tuning:
model.ckpt(weights checkpoint)tokenizer.json(tokenizer)train_config.yaml(training config snapshot)open_weights_manifest.json(hashes + sizes)
No training scripts, dataset pipeline code, or private data are uploaded.
Architecture Summary
- Experts: 20
- Router top-k: 2
- Max loaded experts (residency): 20
- Hidden size: 1536
- Transformer layers: 18
- Attention heads: 12
Expert Names
writer.en, writer.de, writer.es, python.coder, python.bugfix, java.coder, java.bugfix, go.coder, go.bugfix, javascript.coder, typescript.coder, rust.coder, tooling.terminal, tooling.git, tooling.tests, tooling.ci, tooling.packaging, tooling.docker, reasoning.long, eval.holdout
Example Usage (this repo tooling)
moe20-chat \
--config training_moe20/configs/train_moe20_a100_80gb_transformer_v2.yaml \
--checkpoint model.ckpt \
--force-experts writer.en,python.coder \
--debug
Notes
- Responses quality depends strongly on training step and tokenizer quality.
- Prefer immutable checkpoints like
step-XXXXX.ckptfor reproducible evaluation.
Publisher Notes
This checkpoint is trained for sparse MoE coding assistance with explicit router supervision.
Current intended usage:
- code completion and small coding tasks
- structured tool-aware prompts
- expert-constrained inference (
--force-experts)
Known limitations:
- still mid-training; code quality depends on checkpoint step
- may produce repetitive output on generic prompts
- best results currently come from constrained expert pairs and coding-focused prompts
Safety:
- no private chain-of-thought targets
- no private user data included in release artifacts
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support