Instructions to use Tesslate/OmniCoder-9B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Tesslate/OmniCoder-9B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Tesslate/OmniCoder-9B") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("Tesslate/OmniCoder-9B") model = AutoModelForImageTextToText.from_pretrained("Tesslate/OmniCoder-9B") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Tesslate/OmniCoder-9B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Tesslate/OmniCoder-9B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Tesslate/OmniCoder-9B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Tesslate/OmniCoder-9B
- SGLang
How to use Tesslate/OmniCoder-9B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Tesslate/OmniCoder-9B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Tesslate/OmniCoder-9B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Tesslate/OmniCoder-9B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Tesslate/OmniCoder-9B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Tesslate/OmniCoder-9B with Docker Model Runner:
docker model run hf.co/Tesslate/OmniCoder-9B
Agent trajectories vs synthetic data: error recovery patterns in production
Training on 425K curated agentic trajectories from Claude Opus 4.6 and GPT-5.x scaffolding is a compelling approach β learning read-before-write patterns and LSP diagnostic response from real agent traces rather than synthetic task descriptions.
The Terminal-Bench improvement (+61% over base Qwen3.5-9B) suggests these patterns transfer well. A few questions for production deployment:
The error recovery behavior β have you tested recovery rates when the model encounters novel error types not in the training trajectories? In my experience with LangGraph agents, models often struggle with unseen LSP errors that don't match their training distribution.
The minimal edit diffs vs full rewrites β this is exactly what production code agents need. Have you measured the token savings on typical edit operations? For 262K context, edit efficiency directly impacts cost.
For the 425K trajectory dataset β what's the breakdown between successful vs failed trajectories? Learning from failed attempts (with proper scaffolding) often improves robustness, but can also propagate bad patterns if not filtered.
Impressive GPQA Diamond results (83.8% pass@1). Looking forward to testing against agent orchestration benchmarks like BFCL.
For teams integrating OmniCoder: the Apache 2.0 license and GGUF availability make this a strong candidate for local coding agents where frontier model APIs aren't feasible.