T-pro-it-2.1-int8-ov

OpenVINO int8 quantized version of t-tech/T-pro-it-2.1.

📥 Quick Start - Download & Run

1. Install HF CLI:

pip install -U huggingface_hub["cli"]

2. Download model:

hf download savvadesogle/T-pro-it-2.1-int8-ov --local-dir ./T-pro-it-2.1-int8-ov

🔧 Quantization Parameters

Weight compression was performed using optimum-cli export openvino with the following parameters:

optimum-cli export openvino ^
  --model ./T-pro-it-2.1 ^
  --task text-generation-with-past ^
  --weight-format int8 ^
  ./T-pro-it-2.1-int8-ov

✅ Compatibility

The provided OpenVINO IR model is compatible with:

OpenVINO version 2026.0.0.dev20260102
Optimum 2.1.0.dev0
Optimum Intel 1.27.0.dev0+25fcb63
NNCF 3.0.0.dev0+999c5e91
OpenArc

🎯 Running with OpenArc

OpenArc — OpenAI-compatible inference server for OpenVINO models.

Terminal 1 — Start server:

set OPENARC_API_KEY=BIG_KEY
openarc serve start --host 127.0.0.1

Terminal 2 — Add model:

set OPENARC_API_KEY=BIG_KEY
openarc add --model-name T-pro-it-2.1 --model-path ./T-pro-it-2.1-int8-ov --engine ovgenai --model-type llm --device GPU

openarc load T-pro-it-2.1

openarc bench T-pro-it-2.1

Connect via OpenAI API:

http://127.0.0.1:8000/v1/chat/completions

🌐 OpenAI API Example (Windows CMD)

curl http://127.0.0.1:8000/v1/chat/completions ^
  -H "Content-Type: application/json" ^
  -H "Authorization: Bearer BIG_KEY" ^
  -d "{\"model\":\"T-pro-it-2.1-int8-ov\",\"messages\":[{\"role\":\"user\",\"content\":\"Расскажи анекдот\"}],\"temperature\":0.7,\"max_tokens\":128}"

📊 Performance Metrics

openarc bench T-pro-it-2.1-int8-ov

⚠️ Limitations

Check the original model card for limitations.

📄 Legal information

Distributed under the same license as the original model.

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for savvadesogle/T-pro-it-2.1-int8-ov

Base model

Qwen/Qwen3-32B

Finetuned

t-tech/T-pro-it-2.1

Quantized

(4)

this model