Qwen2.5-VL-3B Co-occurrence Context Classifier

Authors: Tianze Yang*, Tyson Jordan*, Ruitong Sun*, Ninghao Liu, Jin Sun *Equal contribution | Affiliation: University of Georgia

Overview

A fine-tuned Qwen2.5-VL-3B-Instruct model for detecting out-of-context objects based on the co-occurrence criterion.

Given an image with an object marked by a red bounding box, the model determines whether the object can reasonably appear together with other objects in the scene. If the combination is unusual or uncommon in real-world contexts, the object is classified as out-of-context.

How to Use

from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
import torch

model_id = "COinCO/Qwen2.5-VL-3B-Co_occurrence"

model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto",
)
processor = AutoProcessor.from_pretrained(model_id)

Training Details

Parameter Value
Base Model Qwen2.5-VL-3B-Instruct
Method LoRA fine-tuning (merged)
Dataset COinCO
Training Samples ~5,000
Epochs 3
Learning Rate 2e-4

Evaluation Results

Inpainted Test Set (In-context vs Out-of-context)

Model Accuracy Precision Recall F1
Baseline (Qwen2.5-VL-3B) 75.54% โ€” โ€” โ€”
This model 80.82% 80.42% 79.87% 79.76%

Real COCO Images (shortcut learning detection, higher = better)

Model Accuracy
Baseline 88.95%
This model 87.00%

Related Resources

Citation

@article{yang2025coinco,
  title={Common Inpainted Objects In-N-Out of Context},
  author={Tianze Yang and Tyson Jordan and Ruitong Sun and Ninghao Liu and Jin Sun},
  year={2025}
}
Downloads last month
10
Safetensors
Model size
4B params
Tensor type
F16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for COinCO/Qwen2.5-VL-3B-Co_occurrence

Adapter
(105)
this model