DuckDB-NSQL-7B-v0.1-mlx

This is an MLX-optimized version of motherduckdb/DuckDB-NSQL-7B-v0.1, converted for efficient inference on Apple Silicon (M1/M2/M3/M4).

Model Description

DuckDB-NSQL-7B is a 7-billion parameter language model fine-tuned for generating DuckDB SQL queries from natural language questions. This MLX conversion provides significant performance improvements on Apple Silicon Macs compared to PyTorch CPU inference.

Conversion Details

  • Base Model: motherduckdb/DuckDB-NSQL-7B-v0.1
  • Precision: Float16 (FP16)
  • Framework: MLX
  • Optimized for: Apple Silicon (M1/M2/M3/M4)
  • Model Size: ~13.5 GB
  • Converted by: aikhan1

Installation

pip install mlx-lm

Usage

Basic Inference

from mlx_lm import load, generate

# Load the model
model, tokenizer = load("aikhan1/DuckDB-NSQL-7B-v0.1-mlx")

# Example schema
schema = """
CREATE TABLE hospitals (
    hospital_id BIGINT PRIMARY KEY,
    hospital_name VARCHAR,
    region VARCHAR,
    bed_capacity INTEGER
);

CREATE TABLE patients (
    patient_id BIGINT PRIMARY KEY,
    full_name VARCHAR,
    gender VARCHAR,
    date_of_birth DATE,
    region VARCHAR
);
"""

# Example question
question = "How many patients are there in each region?"

# Build prompt
prompt = f"""You are an assistant that writes valid DuckDB SQL queries.

### Schema:
{schema}

### Question:
{question}

### Response (DuckDB SQL only):"""

# Generate SQL
response = generate(model, tokenizer, prompt=prompt, max_tokens=200, temp=0.0)
print(response)

Using MLX Server

# Start the server
mlx_lm.server --model aikhan1/DuckDB-NSQL-7B-v0.1-mlx --port 8080

# In another terminal, make requests
curl -X POST http://localhost:8080/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "CREATE TABLE patients(...)\n\nQuestion: Count patients by region\n\nSQL:",
    "max_tokens": 200,
    "temperature": 0
  }'

Performance Comparison

Speed

On Apple Silicon M-series chips:

Model M1 Pro/Max M2/M3 Series
FP16 ~30-50 tok/s ~50-80 tok/s
8-bit ~60-120 tok/s ~120-200 tok/s
4-bit ~90-180 tok/s ~180-300 tok/s

Memory Usage

  • FP16 model: ~13.5 GB
  • 8-bit model: ~3.5 GB (74% reduction)
  • 4-bit model: ~2 GB (85% reduction)

Quality

The FP16 version provides 100% accuracy relative to the original model, with no quantization loss. This is the reference version for maximum quality.

Why FP16?

The FP16 version is ideal for:

โœ… Maximum Accuracy: No quantization, full model precision
โœ… Reference Quality: 100% of original model capability
โœ… MLX Optimization: Still faster than PyTorch CPU
โœ… Production Critical: When accuracy is paramount

Recommended for: When you have sufficient memory (16GB+ RAM) and need maximum accuracy

Trade-offs: Larger size and slower than quantized versions, but no quality loss

Prompt Format

The model expects prompts in this format:

You are an assistant that writes valid DuckDB SQL queries.

### Schema:
CREATE TABLE table_name (
  column1 TYPE,
  column2 TYPE
);

### Question:
[Your natural language question]

### Response (DuckDB SQL only):

Limitations

  • The model is trained specifically for DuckDB SQL syntax
  • Complex queries may require post-processing
  • The model may occasionally generate invalid SQL for complex schemas
  • Best performance on well-defined schemas with clear column names
  • Requires ~16GB+ RAM for comfortable inference

Model Versions

Version Size Speed Quality Use Case
FP16 13.5 GB 1x 100% Maximum accuracy
8-bit 3.5 GB 2-3x ~99% Production (recommended)
4-bit 2 GB 3-4x ~97% Resource-constrained

Which Version Should I Use?

License

This model inherits the Llama 2 Community License Agreement from the base model.

Citation

@misc{duckdb-nsql-mlx,
  title={DuckDB-NSQL-7B MLX Conversion},
  author={aikhan1},
  year={2025},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/aikhan1/DuckDB-NSQL-7B-v0.1-mlx}}
}

Original model:

@misc{duckdb-nsql,
  title={DuckDB-NSQL-7B: Natural Language to SQL for DuckDB},
  author={MotherDuck},
  year={2024},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/motherduckdb/DuckDB-NSQL-7B-v0.1}}
}

Acknowledgments

Downloads last month
1
Safetensors
Model size
7B params
Tensor type
BF16
ยท
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Nuxera/duckdb-nsql-7b-mlx

Finetuned
(2)
this model