Qwen2.5-VL-3B Co-occurrence Context Classifier

Authors: Tianze Yang*, Tyson Jordan*, Ruitong Sun*, Ninghao Liu, Jin Sun *Equal contribution | Affiliation: University of Georgia

Overview

A fine-tuned Qwen2.5-VL-3B-Instruct model for detecting out-of-context objects based on the co-occurrence criterion.

Given an image with an object marked by a red bounding box, the model determines whether the object can reasonably appear together with other objects in the scene. If the combination is unusual or uncommon in real-world contexts, the object is classified as out-of-context.

How to Use

from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
import torch

model_id = "COinCO/Qwen2.5-VL-3B-Co_occurrence"

model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto",
)
processor = AutoProcessor.from_pretrained(model_id)

Training Details

Parameter	Value
Base Model	Qwen2.5-VL-3B-Instruct
Method	LoRA fine-tuning (merged)
Dataset	COinCO
Training Samples	~5,000
Epochs	3
Learning Rate	2e-4

Evaluation Results

Inpainted Test Set (In-context vs Out-of-context)

Model	Accuracy	Precision	Recall	F1
Baseline (Qwen2.5-VL-3B)	75.54%	—	—	—
This model	80.82%	80.42%	79.87%	79.76%

Real COCO Images (shortcut learning detection, higher = better)

Model	Accuracy
Baseline	88.95%
This model	87.00%

Related Resources

Dataset: COinCO/COinCO-dataset
Code: YangTianze009/COinCO
Other models: Location | Size

Citation

@article{yang2025coinco,
  title={Common Inpainted Objects In-N-Out of Context},
  author={Tianze Yang and Tyson Jordan and Ruitong Sun and Ninghao Liu and Jin Sun},
  year={2025}
}

Downloads last month: 10

Safetensors

Model size

4B params

Tensor type

F16

Model tree for COinCO/Qwen2.5-VL-3B-Co_occurrence

Base model

Qwen/Qwen2.5-VL-3B-Instruct

Adapter

(105)

this model