nano

Model repository: Kanonenbombe/nano

Sparse MoE coding model checkpoint exported from the local training_moe20 pipeline.

Open Weights Only

This repository intentionally contains only open model artifacts for inference/fine-tuning:

  • model.ckpt (weights checkpoint)
  • tokenizer.json (tokenizer)
  • train_config.yaml (training config snapshot)
  • open_weights_manifest.json (hashes + sizes)

No training scripts, dataset pipeline code, or private data are uploaded.

Architecture Summary

  • Experts: 20
  • Router top-k: 2
  • Max loaded experts (residency): 20
  • Hidden size: 1536
  • Transformer layers: 18
  • Attention heads: 12

Expert Names

writer.en, writer.de, writer.es, python.coder, python.bugfix, java.coder, java.bugfix, go.coder, go.bugfix, javascript.coder, typescript.coder, rust.coder, tooling.terminal, tooling.git, tooling.tests, tooling.ci, tooling.packaging, tooling.docker, reasoning.long, eval.holdout

Example Usage (this repo tooling)

moe20-chat \
  --config training_moe20/configs/train_moe20_a100_80gb_transformer_v2.yaml \
  --checkpoint model.ckpt \
  --force-experts writer.en,python.coder \
  --debug

Notes

  • Responses quality depends strongly on training step and tokenizer quality.
  • Prefer immutable checkpoints like step-XXXXX.ckpt for reproducible evaluation.

Publisher Notes

This checkpoint is trained for sparse MoE coding assistance with explicit router supervision.

Current intended usage:

  • code completion and small coding tasks
  • structured tool-aware prompts
  • expert-constrained inference (--force-experts)

Known limitations:

  • still mid-training; code quality depends on checkpoint step
  • may produce repetitive output on generic prompts
  • best results currently come from constrained expert pairs and coding-focused prompts

Safety:

  • no private chain-of-thought targets
  • no private user data included in release artifacts
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support