Zenyx-Base-220M: High-Density Nano Foundation Model

Model Architecture Parameter Count Training Tokens Format

Zenyx-Base-220M is a 220 million parameter causal language model built from scratch using JAX/Flax on Kaggle TPU v5e-8.

Unlike typical small models trained on limited data, Zenyx-Base was trained on ~153 Billion tokensβ€”far exceeding the Chinchilla optimal point for this parameter count. This "over-training" strategy was employed to maximize the information density and logic capabilities of the weights, creating a robust foundation for reasoning tasks.

🧠 Model Description

  • Architecture: Custom Llama-style Transformer (RoPE, SwiGLU, RMSNorm, Grouped Query Attention).
  • Tokenizer: Qwen 2.5 Tokenizer (151,650 Vocab Size) for high compression efficiency.
  • Context Window: 2048 Tokens.
  • Training Hardware: TPU v5e-8.
  • Final Validation Loss: ~2.38 (Exceptional convergence for 220M).

Technical Specifications

Hyperparameter Value
Layers 12
Hidden Dim 768
MLP Dim 3072
Attention Heads 12
KV Heads 4 (GQA)
Vocab Size 151,646

πŸ“š Training Curriculum (The "Omni-Mix")

The model was trained using a rigorous 4-stage curriculum designed to layer capabilities sequentially:

  1. Phase 1: Fundamentals (FineWeb-Edu)
    • Focus on high-quality educational English text to establish linguistic baselines.
  2. Phase 2: Logic & Structure (StarCoder - Python)
    • Introduction of code data to enforce logical indentation, syntax, and structured thinking.
  3. Phase 3: Multilingualism (FineWeb-2)
    • Exposure to 6 major languages (Hindi, Chinese, Russian, Japanese, French, Spanish) to expand the semantic embedding space.
  4. Phase 4: The Infinite Polish (Omni-Mix)
    • A weighted interleaving of all previous datasets plus OpenWebMath to converge the model's logic and language capabilities.

πŸ’» Usage

This model is a raw JAX/Flax checkpoint saved in .safetensors format. It uses a custom architecture definition and requires flax and jax to run.

Loading with JAX/Flax

import jax
import jax.numpy as jnp
from flax.training import train_state
from flax import serialization
from safetensors.flax import load_file
from transformers import AutoTokenizer
import flax.linen as nn

# 1. Define Architecture (Must match training config)
class TransformerLM(nn.Module):
    vocab_size: int
    embed_dim: int = 768
    num_layers: int = 12
    num_heads: int = 12
    num_kv_heads: int = 4
    mlp_dim: int = 3072
    max_length: int = 2048
    dropout_rate: float = 0.0
    
    # ... (Insert full model class definition here from the training script) ...

# 2. Load Resources
repo_id = "Arko007/Zenyx_Base_220M"
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-0.5B-Instruct", trust_remote_code=True)

# 3. Initialize & Load Weights
model = TransformerLM(vocab_size=len(tokenizer))
dummy_input = jnp.ones((1, 1), dtype=jnp.int32)
params = model.init(jax.random.PRNGKey(0), dummy_input)['params']

# Load Safetensors
# Ensure model.safetensors is downloaded locally
loaded_params = load_file("model.safetensors") 
print("Weights loaded successfully!")

⚠️ Limitations

  • Size: At 220M parameters, the model's knowledge retrieval capacity is limited compared to 7B+ models.
  • Base Model: This is a pre-trained base. It has not been fine-tuned for chat or instruction following (see Zenyx-DeepSeek-220M for the instruct version).
  • Hallucinations: While logically consistent, it may generate factually incorrect statements.

πŸ“œ Citation

@misc{ZenyxBase220M,
  title = {Zenyx-Base-220M: High-Density Foundation Model},
  author = {Arko007},
  year = {2025},
  publisher = {HuggingFace},
  url = {[https://huggingface.co/Arko007/Zenyx_Base_220M](https://huggingface.co/Arko007/Zenyx_Base_220M)}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Arko007/Zenyx_Base_220M

Finetunes
1 model

Datasets used to train Arko007/Zenyx_Base_220M