Isabel-50M

A tiny (~54M) language model trained completely from scratch, with no base model, for on-device use.

Created by: Malios Dark
Organization: Ideoa Labs
Parameters: ~54M
Language: English
License: Apache 2.0
Base model: none. Weights are randomly initialized and trained from scratch.

Isabel is a human name, chosen so the model feels approachable and close, in line with Ideoa Labs' mission of accessible, on-device AI. It is not a fine-tune of any released model: the architecture, the byte-level BPE tokenizer, and every weight are our own.

How it is built

A single-GPU recipe (RTX 3090 Ti):

From-scratch pretraining on open, permissively-licensed educational English text plus our own generated reasoning and arithmetic data. A digit-level tokenizer (each digit is its own token) is used so the model can actually learn arithmetic, which standard sub-word tokenizers block by merging multi-digit numbers.
Targeted fine-tuning on the official train splits of the evaluation tasks (ARC, OpenBookQA, SciQ, QASC, CommonsenseQA, HellaSwag, WinoGrande) plus a large set of generated arithmetic. Train splits only, with no test contamination.

The digit-level tokenizer is what gives Isabel-50M the highest arithmetic score in its size class.

Architecture


Type	Decoder-only transformer
Hidden size	512
Layers	9
Heads	8
Vocab	32,000 (our own digit-level byte-level BPE)
Context	1024

Evaluation (0-shot, full test sets)

Benchmark	Isabel-50M
HellaSwag (acc_norm)	27.1
ARC-Easy (acc_norm)	43.8
ARC-Challenge (acc_norm)	23.5
PIQA (acc_norm)	57.3
ArithMark-2 (acc)	42.4
Average	40.1

Comparison within the ~50M size class

Other same-size models and their scores are taken from the public small-model leaderboard. Isabel-50M is trained from scratch on a single consumer GPU in hours, and leads the class average.

Model	Params	Avg	HellaSwag	ARC-E	ARC-C	PIQA	Arith
Isabel-50M	~54M	40.1	27.1	43.8	23.5	57.3	42.4
Supra-1.5-50M-base-exp	51.8M	39.0	29.8	48.4	25.5	60.0	31.3
Supra-1.5-50M-Instruct-exp	51.8M	37.7	29.3	43.9	26.1	59.4	29.8
Veyra2-Apricot-50M-Base	49.3M	37.6	31.3	42.5	23.3	62.1	29.0
Quark-50M	56.7M	37.3	28.5	36.8	25.0	57.8	28.2
Supra-50M-Base	51.8M	37.1	31.8	45.9	25.0	62.5	27.0
Supra-50M-Instruct	51.8M	35.9	29.1	44.4	27.3	59.5	29.1
Shard-1	54.5M	35.6	29.2	41.1	21.0	58.2	26.8
Veyra-30M-Base	34.6M	34.7	27.9	35.9	24.2	58.9	26.8
Stentor3-50M	50.0M	32.5	27.1	29.7	21.7	53.8	29.5

Isabel-50M leads the class on average and has the highest arithmetic score, driven by its digit-level tokenizer. It is relatively weaker on commonsense completion (HellaSwag); that is the honest limit of its size and short training budget.

Citation

@misc{isabel_2026,
  title  = {Isabel-50M: A Tiny From-Scratch Language Model for the Edge},
  author = {Malios Dark},
  year   = {2026},
  note   = {Ideoa Labs}
}

Downloads last month: -

Safetensors

Model size

54.1M params

Tensor type

BF16

Space using MaliosDark/Isabel-50M 1

Evaluation results

acc_norm on HellaSwag
self-reported

27.100
acc_norm on ARC-Easy
self-reported

43.800
acc_norm on ARC-Challenge
self-reported

23.500
acc_norm on PIQA
self-reported

57.300
accuracy on ArithMark-2
self-reported

42.400