Qwen2.5-0.5B GGUF for Text-to-SQL (CPU Inference)

This is a GGUF format version optimized for CPU inference with llama.cpp.

Quick Links

Model Details

  • Format: GGUF f16 (float16 precision)
  • File Size: 949MB
  • Optimized For: CPU inference with llama.cpp
  • Recommended RAM: 4GB+

Performance

Spider Benchmark (200 examples)

Metric Score
Exact Match 0.00%
Normalized Match 0.00%
Component Accuracy 91.94%
Average Similarity 21.78%

Usage with llama.cpp

Installation

# Clone llama.cpp
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make

# Download the model
huggingface-cli download vindows/qwen2.5-0.5b-text-to-sql-gguf qwen2.5-0.5b-text-to-sql-f16.gguf

Run Inference

./llama-cli \
  -m qwen2.5-0.5b-text-to-sql-f16.gguf \
  -p "Convert the following natural language question to SQL:\n\nDatabase: concert_singer\nQuestion: How many singers do we have?\n\nSQL:" \
  -n 128 \
  --temp 0.1

Python Usage (llama-cpp-python)

pip install llama-cpp-python
from llama_cpp import Llama

# Load model
llm = Llama(
    model_path="qwen2.5-0.5b-text-to-sql-f16.gguf",
    n_ctx=2048,
    n_threads=8
)

# Generate SQL
prompt = """Convert the following natural language question to SQL:

Database: concert_singer
Question: How many singers do we have?

SQL:"""

output = llm(prompt, max_tokens=128, temperature=0.1, stop=["\n\n"])
sql = output['choices'][0]['text'].strip()
print(sql)

Quantization Options

This model is provided in f16 format. For smaller file sizes with slight quality trade-off, you can quantize further:

# Quantize to Q4_K_M (recommended for most use cases)
./llama-quantize qwen2.5-0.5b-text-to-sql-f16.gguf qwen2.5-0.5b-text-to-sql-Q4_K_M.gguf Q4_K_M

# Quantize to Q8_0 (higher quality, larger size)
./llama-quantize qwen2.5-0.5b-text-to-sql-f16.gguf qwen2.5-0.5b-text-to-sql-Q8_0.gguf Q8_0

Files

  • qwen2.5-0.5b-text-to-sql-f16.gguf - F16 quantized model (949MB)

Limitations

See main model card for limitations.

License

Apache 2.0

Downloads last month
7
GGUF
Model size
0.5B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for vindows/qwen2.5-0.5b-text-to-sql-gguf

Base model

Qwen/Qwen2.5-0.5B
Quantized
(167)
this model