open-r1/codeforces-cots
Viewer • Updated • 254k • 7.47k • 219
How to use stelterlab/OlympicCoder-32B-AWQ with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "stelterlab/OlympicCoder-32B-AWQ"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "stelterlab/OlympicCoder-32B-AWQ",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker model run hf.co/stelterlab/OlympicCoder-32B-AWQ
How to use stelterlab/OlympicCoder-32B-AWQ with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "stelterlab/OlympicCoder-32B-AWQ" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "stelterlab/OlympicCoder-32B-AWQ",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "stelterlab/OlympicCoder-32B-AWQ" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "stelterlab/OlympicCoder-32B-AWQ",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'How to use stelterlab/OlympicCoder-32B-AWQ with Docker Model Runner:
docker model run hf.co/stelterlab/OlympicCoder-32B-AWQ
AWQ quantization: done by stelterlab in INT4 GEMM with AutoAWQ by casper-hansen (https://github.com/casper-hansen/AutoAWQ/)
Original Weights by the open-r1 team. Original Model Card follows:
OlympicCoder-32B is a code mode that achieves very strong performance on competitive coding benchmarks such as LiveCodeBench andthe 2024 International Olympiad in Informatics.
Here's how you can run the model using the pipeline() function from 🤗 Transformers:
# pip install transformers
# pip install accelerate
import torch
from transformers import pipeline
pipe = pipeline("text-generation", model="open-r1/OlympicCoder-32B", torch_dtype=torch.bfloat16, device_map="auto")
# We use the tokenizer's chat template to format each message - see https://huggingface.co/docs/transformers/main/en/chat_templating
messages = [
{"role": "user", "content": "Write a python program to calculate the 10th Fibonacci number"},
]
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt, max_new_tokens=8000, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
#<|im_start|>user
#Write a python program to calculate the 10th fibonacci number<|im_end|>
#<|im_start|>assistant
#<think>Okay, I need to write a Python program that calculates the 10th Fibonacci number. Hmm, the Fibonacci sequence starts with 0 and 1. Each subsequent number is the sum of the two preceding ones. So the sequence goes: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, and so on. ...
The following hyperparameters were used during training on 16 H100 nodes:
Base model
Qwen/Qwen2.5-32B