You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Chess Reasoner

A chess move prediction model fine-tuned from Qwen3-4B-Instruct to output structured reasoning before selecting moves.

Overview

This model is Phase 1 of a two-stage training pipeline:

  1. SFT (this model) — Align the model to output in a specific <think> + <uci_move> format
  2. GRPO (next step) — Reinforce with Stockfish rewards for stronger play

Output Format

<think>brief reasoning (1-2 sentences)</think>
<uci_move>e2e4</uci_move>

Usage

System Prompt

SYSTEM_PROMPT = """You are an expert chess player.

Given a current game state, you must select the best legal next move. Think in 1-2 sentences, then output your chosen move.

Output format:
<think>brief thinking (2 sentences max)</think>
<uci_move>your_move</uci_move>"""

User Prompt Template

The model expects the board state in the following format:

Here is the current game state
Board (Fen): <FEN string>
Turn: It is your turn (<white/black>)
Legal Moves: <comma-separated UCI moves>
Board:
<board representation>

Full Inference Example

import chess
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model
model = AutoModelForCausalLM.from_pretrained("nuriyev/chess-reasoner")
tokenizer = AutoTokenizer.from_pretrained("nuriyev/chess-reasoner")

# System prompt
SYSTEM_PROMPT = """You are an expert chess player.

Given a current game state, you must select the best legal next move. Think in 1-2 sentences, then output your chosen move.

Output format:
<think>brief thinking (2 sentences max)</think>
<uci_move>your_move</uci_move>"""

# Example position: after 1. e4
fen = "rnbqkbnr/pppppppp/8/8/4P3/8/PPPP1PPP/RNBQKBNR b KQkq - 0 1"
board = chess.Board(fen)

# Build user prompt
user_content = f"""Here is the current game state
Board (Fen): {fen}
Turn: It is your turn ({'white' if board.turn else 'black'})
Legal Moves: {', '.join([move.uci() for move in board.legal_moves])}
Board:
{board}"""

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": user_content},
]

# Generate
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

outputs = model.generate(
    inputs,
    max_new_tokens=128,
    do_sample=False,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Example Output

<think>The e5 pawn is undefended, so I will move my knight to e5 to challenge the center and set up a queen attack on g7.</think>
<uci_move>c7c5</uci_move>

Training Details

Parameter Value
Base Model Qwen/Qwen3-4B-Instruct-2507
Method SFT with LoRA (r=32, α=64)
Dataset nuriyev/chess-reasoning
Epochs 2
Learning Rate 2e-4
Batch Size 16
Max Seq Length 1024

Trained using Unsloth with response-only loss masking.

Limitations

This SFT checkpoint is format-aligned but not yet optimized for move quality. The upcoming GRPO stage will train against Stockfish evaluations to improve actual chess performance.

LoRA Adapter

Also available: nuriyev/chess-reasoner-lora

Downloads last month
226
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train nuriyev/chess-reasoner

Collection including nuriyev/chess-reasoner