Jina Code Embeddings 0.5B - MLX

MLX port of Jina AI's code embedding model for Apple Silicon.

ArXiv | Blog

Installation

pip install mlx tokenizers huggingface_hub

Usage

import mlx.core as mx
from tokenizers import Tokenizer
from model import JinaCodeEmbeddingModel
import json

# Load config
with open("config.json") as f:
    config = json.load(f)

# Load model
model = JinaCodeEmbeddingModel(config)
weights = mx.load("model.safetensors")
model.load_weights(list(weights.items()))
mx.eval(model.parameters())

# Load tokenizer
tokenizer = Tokenizer.from_file("tokenizer.json")

# Encode a natural language query for code search
query_embeddings = model.encode(
    ["print hello world in python"],
    tokenizer,
    task="nl2code",
    prompt_type="query",
)

# Encode code passages
code_embeddings = model.encode(
    ["print('Hello World!')"],
    tokenizer,
    task="nl2code",
    prompt_type="passage",
)

mx.eval(query_embeddings, code_embeddings)

Task Types

Each task uses specific prefixes for queries and passages:

Task Query Prefix Passage Prefix
nl2code Find the most relevant code snippet given the following query: Candidate code snippet:
qa Find the most relevant answer given the following question: Candidate answer:
code2code Find an equivalent code snippet given the following code snippet: Candidate code snippet:
code2nl Find the most relevant comment given the following code snippet: Candidate comment:
code2completion Find the most relevant completion given the following start of code snippet: Candidate completion:

Matryoshka Dimensions

Supports Matryoshka embedding truncation to: 64, 128, 256, 512, 896

embeddings = model.encode(texts, tokenizer, task="nl2code", prompt_type="query", truncate_dim=256)

Model Details

  • Architecture: Qwen2.5-Coder-0.5B
  • Parameters: 0.49B
  • Embedding dimension: 896
  • Max sequence length: 32768 tokens
  • Languages: 15+ programming languages
  • Optimized for: Apple Silicon (M1/M2/M3/M4) with Metal acceleration

Files

jina-code-embeddings-0.5b-mlx/
β”œβ”€β”€ model.safetensors          # Model weights (float16)
β”œβ”€β”€ model.py                    # Model implementation
β”œβ”€β”€ config.json                 # Model configuration
β”œβ”€β”€ tokenizer.json              # Tokenizer
β”œβ”€β”€ tokenizer_config.json
β”œβ”€β”€ vocab.json
β”œβ”€β”€ merges.txt
└── README.md

Citation

@misc{kryvosheieva2025efficientcodeembeddingscode,
      title={Efficient Code Embeddings from Code Generation Models},
      author={Daria Kryvosheieva and Saba Sturua and Michael G\"unther and Scott Martens and Han Xiao},
      year={2025},
      eprint={2508.21290},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2508.21290},
}

License

CC BY-NC 4.0

Links

Downloads last month
74
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for jinaai/jina-code-embeddings-0.5b-mlx

Base model

Qwen/Qwen2.5-0.5B
Finetuned
(2)
this model

Collection including jinaai/jina-code-embeddings-0.5b-mlx

Paper for jinaai/jina-code-embeddings-0.5b-mlx