You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Chess Reasoner

A chess move prediction model fine-tuned from Qwen3-4B-Instruct to output structured reasoning before selecting moves.

Overview

This model is Phase 1 of a two-stage training pipeline:

SFT (this model) — Align the model to output in a specific <think> + <uci_move> format
GRPO (next step) — Reinforce with Stockfish rewards for stronger play

Output Format

<think>brief reasoning (1-2 sentences)</think>
<uci_move>e2e4</uci_move>

Usage

System Prompt

SYSTEM_PROMPT = """You are an expert chess player.

Given a current game state, you must select the best legal next move. Think in 1-2 sentences, then output your chosen move.

Output format:
<think>brief thinking (2 sentences max)</think>
<uci_move>your_move</uci_move>"""

User Prompt Template

The model expects the board state in the following format:

Here is the current game state
Board (Fen): <FEN string>
Turn: It is your turn (<white/black>)
Legal Moves: <comma-separated UCI moves>
Board:
<board representation>

Full Inference Example

import chess
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model
model = AutoModelForCausalLM.from_pretrained("nuriyev/chess-reasoner")
tokenizer = AutoTokenizer.from_pretrained("nuriyev/chess-reasoner")

# System prompt
SYSTEM_PROMPT = """You are an expert chess player.

Given a current game state, you must select the best legal next move. Think in 1-2 sentences, then output your chosen move.

Output format:
<think>brief thinking (2 sentences max)</think>
<uci_move>your_move</uci_move>"""

# Example position: after 1. e4
fen = "rnbqkbnr/pppppppp/8/8/4P3/8/PPPP1PPP/RNBQKBNR b KQkq - 0 1"
board = chess.Board(fen)

# Build user prompt
user_content = f"""Here is the current game state
Board (Fen): {fen}
Turn: It is your turn ({'white' if board.turn else 'black'})
Legal Moves: {', '.join([move.uci() for move in board.legal_moves])}
Board:
{board}"""

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": user_content},
]

# Generate
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

outputs = model.generate(
    inputs,
    max_new_tokens=128,
    do_sample=False,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Example Output

<think>The e5 pawn is undefended, so I will move my knight to e5 to challenge the center and set up a queen attack on g7.</think>
<uci_move>c7c5</uci_move>

Training Details

Parameter	Value
Base Model	Qwen/Qwen3-4B-Instruct-2507
Method	SFT with LoRA (r=32, α=64)
Dataset	nuriyev/chess-reasoning
Epochs	2
Learning Rate	2e-4
Batch Size	16
Max Seq Length	1024

Trained using Unsloth with response-only loss masking.

Limitations

This SFT checkpoint is format-aligned but not yet optimized for move quality. The upcoming GRPO stage will train against Stockfish evaluations to improve actual chess performance.

LoRA Adapter

Also available: nuriyev/chess-reasoner-lora

Downloads last month: 226

Safetensors

Model size

4B params

Tensor type

BF16

Dataset used to train nuriyev/chess-reasoner

Collection including nuriyev/chess-reasoner

Chess Grand Master

Collection

LLM that beats humans at chess 😎 (not yet) • 6 items • Updated 17 days ago