nano

Model repository: Kanonenbombe/nano

Sparse MoE coding model checkpoint exported from the local training_moe20 pipeline.

Open Weights Only

This repository intentionally contains only open model artifacts for inference/fine-tuning:

model.ckpt (weights checkpoint)
tokenizer.json (tokenizer)
train_config.yaml (training config snapshot)
open_weights_manifest.json (hashes + sizes)

No training scripts, dataset pipeline code, or private data are uploaded.

Architecture Summary

Experts: 20
Router top-k: 2
Max loaded experts (residency): 20
Hidden size: 1536
Transformer layers: 18
Attention heads: 12

Expert Names

writer.en, writer.de, writer.es, python.coder, python.bugfix, java.coder, java.bugfix, go.coder, go.bugfix, javascript.coder, typescript.coder, rust.coder, tooling.terminal, tooling.git, tooling.tests, tooling.ci, tooling.packaging, tooling.docker, reasoning.long, eval.holdout

Example Usage (this repo tooling)

moe20-chat \
  --config training_moe20/configs/train_moe20_a100_80gb_transformer_v2.yaml \
  --checkpoint model.ckpt \
  --force-experts writer.en,python.coder \
  --debug

Notes

Responses quality depends strongly on training step and tokenizer quality.
Prefer immutable checkpoints like step-XXXXX.ckpt for reproducible evaluation.

Publisher Notes

This checkpoint is trained for sparse MoE coding assistance with explicit router supervision.

Current intended usage:

code completion and small coding tasks
structured tool-aware prompts
expert-constrained inference (--force-experts)

Known limitations:

still mid-training; code quality depends on checkpoint step
may produce repetitive output on generic prompts
best results currently come from constrained expert pairs and coding-focused prompts

Safety:

no private chain-of-thought targets
no private user data included in release artifacts

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support