Qwen3.5-2B + metro-v23 LoRA

Domain-specialised tool-using agent for transit-kiosk tasks: routing, fare calculation, disruption advisories, accessibility, multilingual cultural notes, multi-turn context tracking, and policy adaptation across 6 metro systems (MARTA, BART, CTA, Doha, Taipei MRT, Beijing Subway).

QLoRA r=16 fine-tune of Qwen/Qwen3.5-2B on 790 distilled traces from Qwen3.5-27B and Qwen3.5-35B-A3B teachers (filtered to tier1 ≥ 90% per case, deduplicated by case_id, evaluated on the MetroLLM-Bench v23 harness).

Files

File Purpose
Qwen3.5-2B-metro-v23-Q4_K_M.gguf (1.2 GB) Runtime artifact for llama.cpp / Ollama
adapter/ Raw LoRA adapter (use with PEFT + base Qwen3.5-2B)
training_summary.json Hyperparameters, seed, dataset version

Eval (v23, 6 systems, Haiku judge for Tier 2)

Cross-system average: Tier-1 81.5, Composite 80.1 (+6.2 T1 / +6.9 Comp vs base Qwen3.5-2B)

System Tier-1 %
MARTA 84.0
BART 80.7
CTA 82.7
DOHA 81.0
TAIPEI 80.5
BEIJING 80.1

Quickstart (llama.cpp)

huggingface-cli download continker/Qwen3.5-2B-metro-v23 \
  Qwen3.5-2B-metro-v23-Q4_K_M.gguf --local-dir ./models

llama-server -m ./models/Qwen3.5-2B-metro-v23-Q4_K_M.gguf \
  --port 8080 --ctx-size 32768 --n-gpu-layers 999

Quickstart (PEFT adapter, Python)

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-2B", torch_dtype="bfloat16")
model = PeftModel.from_pretrained(base, "continker/Qwen3.5-2B-metro-v23", subfolder="adapter")
tokenizer = AutoTokenizer.from_pretrained("continker/Qwen3.5-2B-metro-v23", subfolder="adapter")

Training

  • Base: Qwen/Qwen3.5-2B
  • Method: QLoRA, rank=16, alpha=32, dropout=0.05
  • Targets: q/k/v/o + gate/up/down projections
  • Optimizer: AdamW, lr=2e-4, cosine, warmup 5%
  • Epochs: 3, effective batch 8 (per_device_train_batch_size=2 × grad_accum=4)
  • Max sequence length: 4096
  • Seed: 42 (default; multi-seed CI in progress for 27B)
  • Dataset: 790 distilled examples, see continker/metrollm-bench-train-data-v23

Limitations

  • Trained on 6 metro systems; generalisation to other systems untested.
  • Tool-use schema is specific to the MetroLLM-Bench mock server (route_planner, fare_calculator, station_info, disruption_feed, knowledge_base, submit_assistant_state).
  • Quantised to 4-bit (Q4_K_M); for full-precision behaviour use the adapter on bf16 base weights.

Citation

@misc{metrollm-bench-2026,
  title={MetroLLM-Bench: Evaluating LLMs as Prompt-Driven Transit Kiosk Agents},
  author={Hendriks, Remco and contributors},
  year={2026},
  publisher={HuggingFace},
  howpublished={\url{https://huggingface.co/continker}}
}
Downloads last month
5
GGUF
Model size
2B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for continker/Qwen3.5-2B-metro-v23

Finetuned
Qwen/Qwen3.5-2B
Adapter
(92)
this model