Instructions to use Tesslate/OmniCoder-9B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Tesslate/OmniCoder-9B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Tesslate/OmniCoder-9B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("Tesslate/OmniCoder-9B")
model = AutoModelForImageTextToText.from_pretrained("Tesslate/OmniCoder-9B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Tesslate/OmniCoder-9B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Tesslate/OmniCoder-9B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Tesslate/OmniCoder-9B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Tesslate/OmniCoder-9B

SGLang

How to use Tesslate/OmniCoder-9B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Tesslate/OmniCoder-9B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Tesslate/OmniCoder-9B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Tesslate/OmniCoder-9B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Tesslate/OmniCoder-9B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Tesslate/OmniCoder-9B with Docker Model Runner:
```
docker model run hf.co/Tesslate/OmniCoder-9B
```

Agent trajectories vs synthetic data: error recovery patterns in production

#13

by O96a - opened Mar 30

Discussion

O96a

Mar 30

Training on 425K curated agentic trajectories from Claude Opus 4.6 and GPT-5.x scaffolding is a compelling approach — learning read-before-write patterns and LSP diagnostic response from real agent traces rather than synthetic task descriptions.

The Terminal-Bench improvement (+61% over base Qwen3.5-9B) suggests these patterns transfer well. A few questions for production deployment:

The error recovery behavior — have you tested recovery rates when the model encounters novel error types not in the training trajectories? In my experience with LangGraph agents, models often struggle with unseen LSP errors that don't match their training distribution.
The minimal edit diffs vs full rewrites — this is exactly what production code agents need. Have you measured the token savings on typical edit operations? For 262K context, edit efficiency directly impacts cost.
For the 425K trajectory dataset — what's the breakdown between successful vs failed trajectories? Learning from failed attempts (with proper scaffolding) often improves robustness, but can also propagate bad patterns if not filtered.

Impressive GPQA Diamond results (83.8% pass@1). Looking forward to testing against agent orchestration benchmarks like BFCL.

For teams integrating OmniCoder: the Apache 2.0 license and GGUF availability make this a strong candidate for local coding agents where frontier model APIs aren't feasible.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment