GLiNER2 ONNX export

gliner2-multi-v1 ONNX

model.onnx (FP32 export)
model_fp16.onnx (FP16 weights converted from FP32)
model_int8.onnx (dynamic INT8 quantization via onnxruntime)
tokenizer files copied verbatim from the HF model
a small config.json describing runtime constraints

The export script follows the same design:

ONNX graph includes only encoder + span head tensor computation
Schema logic, label mapping, and decoding stay outside the graph
Inputs: input_ids, attention_mask (optionally token_type_ids)
Output: span_logits
Export with torch.onnx.export (opset 19) and dynamic batch/sequence axes
Convert FP32 weights to FP16 with convert_float_to_float16
Quantize with onnxruntime.quantization.quantize_dynamic(QInt8)

Usage

Enter the dev shell (adds Python + ONNX deps):

nix develop

Install the Python dependencies with Pipenv:

cd onnx
pipenv install
cd ..

Export (run from the onnx directory so Pipenv finds the Pipfile):

pipenv run python export.py \
  --model-id fastino/gliner2-multi-v1 \
  --output-dir gliner2-multi-v1

Validation is enabled by default and compares the exported ONNX output to the PyTorch output for a dummy batch. It also runs a small extraction-method check (entities, classification, JSON) using identical decoding logic. To skip validation or to load the quantized model, use:

pipenv run python export.py --no-validate
pipenv run python export.py --no-validate-extraction
pipenv run python export.py --validate-quantized
pipenv run python export.py --no-fp16

The output directory will include:

model.onnx
model_fp16.onnx
model_int8.onnx
tokenizer.json
tokenizer_config.json
special_tokens_map.json
added_tokens.json
spm.model
config.json

Notes

The export uses a fixed max_seq_len (default 512) and expects inputs padded or truncated to that length. This matches the published bundle's runtime config.
The span_logits label axis is aligned to token positions in the input sequence. Use label marker token positions ([E], [C], [R], [L]) to map logits back to schema labels. Label mapping and decoding are intentionally handled outside the graph.

Downloads last month: 5

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for cuerbot/gliner2-multi-v1

Base model

fastino/gliner2-multi-v1

Quantized

(5)

this model