DuckDB-NSQL-7B-v0.1-mlx

This is an MLX-optimized version of motherduckdb/DuckDB-NSQL-7B-v0.1, converted for efficient inference on Apple Silicon (M1/M2/M3/M4).

Model Description

DuckDB-NSQL-7B is a 7-billion parameter language model fine-tuned for generating DuckDB SQL queries from natural language questions. This MLX conversion provides significant performance improvements on Apple Silicon Macs compared to PyTorch CPU inference.

Conversion Details

Base Model: motherduckdb/DuckDB-NSQL-7B-v0.1
Precision: Float16 (FP16)
Framework: MLX
Optimized for: Apple Silicon (M1/M2/M3/M4)
Model Size: ~13.5 GB
Converted by: aikhan1

Installation

pip install mlx-lm

Usage

Basic Inference

from mlx_lm import load, generate

# Load the model
model, tokenizer = load("aikhan1/DuckDB-NSQL-7B-v0.1-mlx")

# Example schema
schema = """
CREATE TABLE hospitals (
    hospital_id BIGINT PRIMARY KEY,
    hospital_name VARCHAR,
    region VARCHAR,
    bed_capacity INTEGER
);

CREATE TABLE patients (
    patient_id BIGINT PRIMARY KEY,
    full_name VARCHAR,
    gender VARCHAR,
    date_of_birth DATE,
    region VARCHAR
);
"""

# Example question
question = "How many patients are there in each region?"

# Build prompt
prompt = f"""You are an assistant that writes valid DuckDB SQL queries.

### Schema:
{schema}

### Question:
{question}

### Response (DuckDB SQL only):"""

# Generate SQL
response = generate(model, tokenizer, prompt=prompt, max_tokens=200, temp=0.0)
print(response)

Using MLX Server

# Start the server
mlx_lm.server --model aikhan1/DuckDB-NSQL-7B-v0.1-mlx --port 8080

# In another terminal, make requests
curl -X POST http://localhost:8080/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "CREATE TABLE patients(...)\n\nQuestion: Count patients by region\n\nSQL:",
    "max_tokens": 200,
    "temperature": 0
  }'

Performance Comparison

Speed

On Apple Silicon M-series chips:

Model	M1 Pro/Max	M2/M3 Series
FP16	~30-50 tok/s	~50-80 tok/s
8-bit	~60-120 tok/s	~120-200 tok/s
4-bit	~90-180 tok/s	~180-300 tok/s

Memory Usage

FP16 model: ~13.5 GB
8-bit model: ~3.5 GB (74% reduction)
4-bit model: ~2 GB (85% reduction)

Quality

The FP16 version provides 100% accuracy relative to the original model, with no quantization loss. This is the reference version for maximum quality.

Why FP16?

The FP16 version is ideal for:

✅ Maximum Accuracy: No quantization, full model precision
✅ Reference Quality: 100% of original model capability
✅ MLX Optimization: Still faster than PyTorch CPU
✅ Production Critical: When accuracy is paramount

Recommended for: When you have sufficient memory (16GB+ RAM) and need maximum accuracy

Trade-offs: Larger size and slower than quantized versions, but no quality loss

Prompt Format

The model expects prompts in this format:

You are an assistant that writes valid DuckDB SQL queries.

### Schema:
CREATE TABLE table_name (
  column1 TYPE,
  column2 TYPE
);

### Question:
[Your natural language question]

### Response (DuckDB SQL only):

Limitations

The model is trained specifically for DuckDB SQL syntax
Complex queries may require post-processing
The model may occasionally generate invalid SQL for complex schemas
Best performance on well-defined schemas with clear column names
Requires ~16GB+ RAM for comfortable inference

Model Versions

Version	Size	Speed	Quality	Use Case
FP16	13.5 GB	1x	100%	Maximum accuracy
8-bit	3.5 GB	2-3x	~99%	Production (recommended)
4-bit	2 GB	3-4x	~97%	Resource-constrained

Which Version Should I Use?

FP16: You need absolute maximum accuracy and have 16GB+ RAM
8-bit: Best balance for production (recommended for most users) - nuxera/duckdb-nsql-7b-mlx-8bit
4-bit: You're running on limited hardware or need maximum speed - nuxera/duckdb-nsql-7b-mlx-4bit

License

This model inherits the Llama 2 Community License Agreement from the base model.

Citation

@misc{duckdb-nsql-mlx,
  title={DuckDB-NSQL-7B MLX Conversion},
  author={aikhan1},
  year={2025},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/aikhan1/DuckDB-NSQL-7B-v0.1-mlx}}
}

Original model:

@misc{duckdb-nsql,
  title={DuckDB-NSQL-7B: Natural Language to SQL for DuckDB},
  author={MotherDuck},
  year={2024},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/motherduckdb/DuckDB-NSQL-7B-v0.1}}
}

Acknowledgments

Original model by MotherDuck
MLX framework by Apple ML Research
MLX-LM mlx-lm
Nuxera AI

Downloads last month: 1

Safetensors

Model size

7B params

Tensor type

BF16

MLX

Hardware compatibility

Quantized

Model tree for Nuxera/duckdb-nsql-7b-mlx

Base model

meta-llama/Llama-2-7b

Finetuned

motherduckdb/DuckDB-NSQL-7B-v0.1

Finetuned

(2)

this model