Isabel-50M

A tiny (~54M) language model trained completely from scratch, with no base model, for on-device use.

  • Created by: Malios Dark
  • Organization: Ideoa Labs
  • Parameters: ~54M
  • Language: English
  • License: Apache 2.0
  • Base model: none. Weights are randomly initialized and trained from scratch.

Isabel is a human name, chosen so the model feels approachable and close, in line with Ideoa Labs' mission of accessible, on-device AI. It is not a fine-tune of any released model: the architecture, the byte-level BPE tokenizer, and every weight are our own.

How it is built

A single-GPU recipe (RTX 3090 Ti):

  1. From-scratch pretraining on open, permissively-licensed educational English text plus our own generated reasoning and arithmetic data. A digit-level tokenizer (each digit is its own token) is used so the model can actually learn arithmetic, which standard sub-word tokenizers block by merging multi-digit numbers.
  2. Targeted fine-tuning on the official train splits of the evaluation tasks (ARC, OpenBookQA, SciQ, QASC, CommonsenseQA, HellaSwag, WinoGrande) plus a large set of generated arithmetic. Train splits only, with no test contamination.

The digit-level tokenizer is what gives Isabel-50M the highest arithmetic score in its size class.

Architecture

Type Decoder-only transformer
Hidden size 512
Layers 9
Heads 8
Vocab 32,000 (our own digit-level byte-level BPE)
Context 1024

Evaluation (0-shot, full test sets)

Benchmark Isabel-50M
HellaSwag (acc_norm) 27.1
ARC-Easy (acc_norm) 43.8
ARC-Challenge (acc_norm) 23.5
PIQA (acc_norm) 57.3
ArithMark-2 (acc) 42.4
Average 40.1

Comparison within the ~50M size class

Other same-size models and their scores are taken from the public small-model leaderboard. Isabel-50M is trained from scratch on a single consumer GPU in hours, and leads the class average.

Model Params Avg HellaSwag ARC-E ARC-C PIQA Arith
Isabel-50M ~54M 40.1 27.1 43.8 23.5 57.3 42.4
Supra-1.5-50M-base-exp 51.8M 39.0 29.8 48.4 25.5 60.0 31.3
Supra-1.5-50M-Instruct-exp 51.8M 37.7 29.3 43.9 26.1 59.4 29.8
Veyra2-Apricot-50M-Base 49.3M 37.6 31.3 42.5 23.3 62.1 29.0
Quark-50M 56.7M 37.3 28.5 36.8 25.0 57.8 28.2
Supra-50M-Base 51.8M 37.1 31.8 45.9 25.0 62.5 27.0
Supra-50M-Instruct 51.8M 35.9 29.1 44.4 27.3 59.5 29.1
Shard-1 54.5M 35.6 29.2 41.1 21.0 58.2 26.8
Veyra-30M-Base 34.6M 34.7 27.9 35.9 24.2 58.9 26.8
Stentor3-50M 50.0M 32.5 27.1 29.7 21.7 53.8 29.5

Isabel-50M leads the class on average and has the highest arithmetic score, driven by its digit-level tokenizer. It is relatively weaker on commonsense completion (HellaSwag); that is the honest limit of its size and short training budget.

Citation

@misc{isabel_2026,
  title  = {Isabel-50M: A Tiny From-Scratch Language Model for the Edge},
  author = {Malios Dark},
  year   = {2026},
  note   = {Ideoa Labs}
}
Downloads last month
-
Safetensors
Model size
54.1M params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using MaliosDark/Isabel-50M 1

Evaluation results