VulnLLM-R-7B: Specialized Reasoning LLM for Vulnerability Detection

VulnLLM-R is the first specialized reasoning Large Language Model designed specifically for software vulnerability detection.

Unlike traditional static analysis tools (like CodeQL) or small LLMs that rely on simple pattern matching, VulnLLM-R is trained to reason step-by-step about data flow, control flow, and security context. It mimics the thought process of a human security auditor to identify complex logic vulnerabilities with high accuracy.

🔗 Quick Links

Paper: arXiv:2512.07533
Code & Data: GitHub
Demo: Web demo

💡 Key Features

Reasoning-Based Detection: Does not just classify code; it generates a "Chain-of-Thought" to analyze why a vulnerability exists.
Superior Accuracy: Outperforms commercial giants (like Claude-3.7-Sonnet, o3-mini) and industry-standard tools (CodeQL, AFL++) on key benchmarks.
Efficiency: Achieves SOTA performance with only 7B parameters, making it 30x smaller and significantly faster than general-purpose reasoning models.
Broad Coverage: Trained and tested on C, C++, Python, and Java (zero-shot generalization).

🚀 Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "UCSB-SURFI/VulnLLM-R-7B"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name, 
    torch_dtype=torch.bfloat16, 
    device_map="auto"
)

# Example Code Snippet
code_snippet = """
void vulnerable_function(char *input) {
    char buffer[50];
    strcpy(buffer, input); // Potential buffer overflow
}
"""

# Prompt Template (Triggering Reasoning)
prompt = f"""You are an advanced vulnerability detection model. 
Please analyze the following code step-by-step to determine if it contains a vulnerability.

Code:
{code_snippet}

Please provide your reasoning followed by the final answer.
"""

messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    model_inputs.input_ids,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

📊 Performance

VulnLLM-R-7B achieves state-of-the-art results on benchmarks including PrimeVul, Juliet 1.3, and ARVO.

(Refer to Figure 1 and Table 4 in the paper for detailed metrics)

📚 Citation

If you use this model in your research, please cite our paper:

@article{nie2025vulnllmr,
  title={VulnLLM-R: Specialized Reasoning LLM with Agent Scaffold for Vulnerability Detection},
  author={Nie, Yuzhou and Li, Hongwei and Guo, Chengquan and Jiang, Ruizhe and Wang, Zhun and Li, Bo and Song, Dawn and Guo, Wenbo},
  journal={arXiv preprint arXiv:2512.07533},
  year={2025}
}