Qwen2.5-VL-3B Co-occurrence Context Classifier
Authors: Tianze Yang*, Tyson Jordan*, Ruitong Sun*, Ninghao Liu, Jin Sun *Equal contribution | Affiliation: University of Georgia
Overview
A fine-tuned Qwen2.5-VL-3B-Instruct model for detecting out-of-context objects based on the co-occurrence criterion.
Given an image with an object marked by a red bounding box, the model determines whether the object can reasonably appear together with other objects in the scene. If the combination is unusual or uncommon in real-world contexts, the object is classified as out-of-context.
How to Use
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
import torch
model_id = "COinCO/Qwen2.5-VL-3B-Co_occurrence"
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
model_id,
torch_dtype=torch.float16,
device_map="auto",
)
processor = AutoProcessor.from_pretrained(model_id)
Training Details
| Parameter | Value |
|---|---|
| Base Model | Qwen2.5-VL-3B-Instruct |
| Method | LoRA fine-tuning (merged) |
| Dataset | COinCO |
| Training Samples | ~5,000 |
| Epochs | 3 |
| Learning Rate | 2e-4 |
Evaluation Results
Inpainted Test Set (In-context vs Out-of-context)
| Model | Accuracy | Precision | Recall | F1 |
|---|---|---|---|---|
| Baseline (Qwen2.5-VL-3B) | 75.54% | โ | โ | โ |
| This model | 80.82% | 80.42% | 79.87% | 79.76% |
Real COCO Images (shortcut learning detection, higher = better)
| Model | Accuracy |
|---|---|
| Baseline | 88.95% |
| This model | 87.00% |
Related Resources
- Dataset: COinCO/COinCO-dataset
- Code: YangTianze009/COinCO
- Other models: Location | Size
Citation
@article{yang2025coinco,
title={Common Inpainted Objects In-N-Out of Context},
author={Tianze Yang and Tyson Jordan and Ruitong Sun and Ninghao Liu and Jin Sun},
year={2025}
}
- Downloads last month
- 10
Model tree for COinCO/Qwen2.5-VL-3B-Co_occurrence
Base model
Qwen/Qwen2.5-VL-3B-Instruct