nielsr HF Staff commited on
Commit
10cba61
Β·
verified Β·
1 Parent(s): 9f63d15

Improve model card and add metadata

Browse files

This PR improves the model card for VPTracker. Key changes include:
- Added YAML metadata including `pipeline_tag: image-text-to-text`, `library_name: transformers`, and `base_model: Qwen/Qwen3-VL-4B-Instruct`.
- Added links to the research paper [VPTracker: Global Vision-Language Tracking via Visual Prompt and MLLM](https://huggingface.co/papers/2512.22799) and the official [GitHub repository](https://github.com/jcwang0602/VPTracker).
- Included a brief summary of the model's functionality based on the paper abstract.
- Retained and organized the installation and citation sections for better readability.

Files changed (1) hide show
  1. README.md +22 -6
README.md CHANGED
@@ -1,6 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
1
  # VPTracker: Global Vision-Language Tracking via Visual Prompt and MLLM
2
 
3
- [![arXiv](https://img.shields.io/badge/Arxiv-2508.04107-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2512.22799)
 
 
 
 
 
 
 
4
  [![Python](https://img.shields.io/badge/Python-3.9-blue.svg)](https://www.python.org/downloads/)
5
  [![PyTorch](https://img.shields.io/badge/PyTorch-2.5.1-red.svg)](https://pytorch.org/)
6
  [![Transformers](https://img.shields.io/badge/Transformers-4.37.2-green.svg)](https://huggingface.co/docs/transformers/)
@@ -31,24 +49,22 @@ pip install flash-attn==2.7.4.post1 --no-build-isolation
31
  conda install av -c conda-forge
32
  pip install qwen_vl_utils qwen_omni_utils decord librosa icecream soundfile -U
33
  pip install liger_kernel nvitop pre-commit math_verify py-spy -U
34
-
35
  ```
36
 
37
  <!-- ## πŸ‘€ Visualization
38
  <img src="assets/Results.jpg" width="800"> -->
39
 
40
  ## πŸ™ Acknowledgments
41
- This code is developed on the top of [ms-swift](https://github.com/modelscope/ms-swift)
42
 
43
  ## βœ‰οΈ Contact
44
-
45
  Email: [email protected]. Any kind discussions are welcomed!
46
 
47
  ---
48
 
49
  ## πŸ“– Citation
50
- If our work is useful for your research, please consider cite:
51
- ```
52
  @misc{wang2025vptrackerglobalvisionlanguagetracking,
53
  title={VPTracker: Global Vision-Language Tracking via Visual Prompt and MLLM},
54
  author={Jingchao Wang and Kaiwen Zhou and Zhijian Wu and Kunhua Ji and Dingjiang Huang and Yefeng Zheng},
 
1
+ ---
2
+ pipeline_tag: image-text-to-text
3
+ library_name: transformers
4
+ base_model: Qwen/Qwen3-VL-4B-Instruct
5
+ tags:
6
+ - vision-language-tracking
7
+ - multimodal
8
+ - mllm
9
+ - video
10
+ ---
11
+
12
  # VPTracker: Global Vision-Language Tracking via Visual Prompt and MLLM
13
 
14
+ This repository contains the weights for **VPTracker**, the first global tracking framework based on Multimodal Large Language Models (MLLMs).
15
+
16
+ VPTracker exploits the powerful semantic reasoning of MLLMs to locate targets across the entire image space. To address distractions from visually or semantically similar objects during global search, it introduces a location-aware visual prompting mechanism that incorporates spatial priors.
17
+
18
+ - **Paper:** [VPTracker: Global Vision-Language Tracking via Visual Prompt and MLLM](https://huggingface.co/papers/2512.22799)
19
+ - **Repository:** [GitHub - jcwang0602/VPTracker](https://github.com/jcwang0602/VPTracker)
20
+
21
+ [![arXiv](https://img.shields.io/badge/Arxiv-2512.22799-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2512.22799)
22
  [![Python](https://img.shields.io/badge/Python-3.9-blue.svg)](https://www.python.org/downloads/)
23
  [![PyTorch](https://img.shields.io/badge/PyTorch-2.5.1-red.svg)](https://pytorch.org/)
24
  [![Transformers](https://img.shields.io/badge/Transformers-4.37.2-green.svg)](https://huggingface.co/docs/transformers/)
 
49
  conda install av -c conda-forge
50
  pip install qwen_vl_utils qwen_omni_utils decord librosa icecream soundfile -U
51
  pip install liger_kernel nvitop pre-commit math_verify py-spy -U
 
52
  ```
53
 
54
  <!-- ## πŸ‘€ Visualization
55
  <img src="assets/Results.jpg" width="800"> -->
56
 
57
  ## πŸ™ Acknowledgments
58
+ This code is developed on top of [ms-swift](https://github.com/modelscope/ms-swift).
59
 
60
  ## βœ‰οΈ Contact
 
61
  Email: [email protected]. Any kind discussions are welcomed!
62
 
63
  ---
64
 
65
  ## πŸ“– Citation
66
+ If our work is useful for your research, please consider citing:
67
+ ```bibtex
68
  @misc{wang2025vptrackerglobalvisionlanguagetracking,
69
  title={VPTracker: Global Vision-Language Tracking via Visual Prompt and MLLM},
70
  author={Jingchao Wang and Kaiwen Zhou and Zhijian Wu and Kunhua Ji and Dingjiang Huang and Yefeng Zheng},