GigaCheck-Classifier-Multi

🌐 LLMTrace Website | πŸ“œ LLMTrace Paper on arXiv | πŸ€— LLMTrace - Classification Dataset | Github |

Model Card

Model Description

This is the official GigaCheck-Classifier-Multi model from the LLMTrace project. It is a multilingual transformer-based model trained for the binary classification of text as either human or ai.

The model was trained jointly on the English and Russian portions of the LLMTrace Classification dataset. It is designed to be a robust baseline for detecting AI-generated content across multiple domains, text lengths and prompt types.

For complete details on the training data, methodology, and evaluation, please refer to our research paper: link(coming soon)

Intended Use & Limitations

This model is intended for academic research, analysis of AI-generated content, and as a baseline for developing more advanced detection tools.

Limitations:

  • The model's performance may degrade on text generated by LLMs released after its training date (September 2025).
  • It is not infallible and can produce false positives (flagging human text as AI) and false negatives.
  • Performance may vary on domains or styles of text not well-represented in the training data.

Evaluation

The model was evaluated on the test split of the LLMTrace Classification dataset, which was not seen during training. Performance metrics are reported below:

Metric Value
F1 Score (AI) 98.64
F1 Score (Human) 98.00
Mean Accuracy 98.46
TPR @ FPR=0.01 97.93

Citation

If you use this model in your research, please cite our papers:

@article{Layer2025LLMTrace,
  Title = {{LLMTrace: A Corpus for Classification and Fine-Grained Localization of AI-Written Text}},
  Author = {Irina Tolstykh and Aleksandra Tsybina and Sergey Yakubson and Maksim Kuprashevich},
  Year = {2025},
  Eprint = {arXiv:2509.21269}
}
@article{tolstykh2024gigacheck,
  title={{GigaCheck: Detecting LLM-generated Content}},
  author={Irina Tolstykh and Aleksandra Tsybina and Sergey Yakubson and Aleksandr Gordeev and Vladimir Dokholyan and Maksim Kuprashevich},
  journal={arXiv preprint arXiv:2410.23728},
  year={2024}
}
Downloads last month
26
Safetensors
Model size
7B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for iitolstykh/GigaCheck-Classifier-Multi

Finetuned
(320)
this model

Dataset used to train iitolstykh/GigaCheck-Classifier-Multi