Improve model card and add metadata

This PR improves the model card for VPTracker. Key changes include:
- Added YAML metadata including `pipeline_tag: image-text-to-text`, `library_name: transformers`, and `base_model: Qwen/Qwen3-VL-4B-Instruct`.
- Added links to the research paper [VPTracker: Global Vision-Language Tracking via Visual Prompt and MLLM](https://huggingface.co/papers/2512.22799) and the official [GitHub repository](https://github.com/jcwang0602/VPTracker).
- Included a brief summary of the model's functionality based on the paper abstract.
- Retained and organized the installation and citation sections for better readability.

Files changed (1) hide show

README.md +22 -6

README.md CHANGED Viewed

@@ -1,6 +1,24 @@
 # VPTracker: Global Vision-Language Tracking via Visual Prompt and MLLM
-[![arXiv](https://img.shields.io/badge/Arxiv-2508.04107-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2512.22799)
 [![Python](https://img.shields.io/badge/Python-3.9-blue.svg)](https://www.python.org/downloads/)
 [![PyTorch](https://img.shields.io/badge/PyTorch-2.5.1-red.svg)](https://pytorch.org/)
 [![Transformers](https://img.shields.io/badge/Transformers-4.37.2-green.svg)](https://huggingface.co/docs/transformers/)
@@ -31,24 +49,22 @@ pip install flash-attn==2.7.4.post1 --no-build-isolation
 conda install av -c conda-forge
 pip install qwen_vl_utils qwen_omni_utils decord librosa icecream soundfile -U
 pip install liger_kernel nvitop pre-commit math_verify py-spy -U
 ```
 <!-- ## 👀 Visualization
 <img src="assets/Results.jpg" width="800"> -->
 ## 🙏 Acknowledgments
-This code is developed on the top of [ms-swift](https://github.com/modelscope/ms-swift)
 ## ✉️ Contact
 Email: [email protected]. Any kind discussions are welcomed!
 ---
 ## 📖 Citation
-If our work is useful for your research, please consider cite:
-```
 @misc{wang2025vptrackerglobalvisionlanguagetracking,
       title={VPTracker: Global Vision-Language Tracking via Visual Prompt and MLLM},
       author={Jingchao Wang and Kaiwen Zhou and Zhijian Wu and Kunhua Ji and Dingjiang Huang and Yefeng Zheng},

+---
+pipeline_tag: image-text-to-text
+library_name: transformers
+base_model: Qwen/Qwen3-VL-4B-Instruct
+tags:
+- vision-language-tracking
+- multimodal
+- mllm
+- video
+---
 # VPTracker: Global Vision-Language Tracking via Visual Prompt and MLLM
+This repository contains the weights for **VPTracker**, the first global tracking framework based on Multimodal Large Language Models (MLLMs).
+VPTracker exploits the powerful semantic reasoning of MLLMs to locate targets across the entire image space. To address distractions from visually or semantically similar objects during global search, it introduces a location-aware visual prompting mechanism that incorporates spatial priors.
+- **Paper:** [VPTracker: Global Vision-Language Tracking via Visual Prompt and MLLM](https://huggingface.co/papers/2512.22799)
+- **Repository:** [GitHub - jcwang0602/VPTracker](https://github.com/jcwang0602/VPTracker)
+[![arXiv](https://img.shields.io/badge/Arxiv-2512.22799-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2512.22799)
 [![Python](https://img.shields.io/badge/Python-3.9-blue.svg)](https://www.python.org/downloads/)
 [![PyTorch](https://img.shields.io/badge/PyTorch-2.5.1-red.svg)](https://pytorch.org/)
 [![Transformers](https://img.shields.io/badge/Transformers-4.37.2-green.svg)](https://huggingface.co/docs/transformers/)
 conda install av -c conda-forge
 pip install qwen_vl_utils qwen_omni_utils decord librosa icecream soundfile -U
 pip install liger_kernel nvitop pre-commit math_verify py-spy -U
 ```
 <!-- ## 👀 Visualization
 <img src="assets/Results.jpg" width="800"> -->
 ## 🙏 Acknowledgments
+This code is developed on top of [ms-swift](https://github.com/modelscope/ms-swift).
 ## ✉️ Contact
 Email: [email protected]. Any kind discussions are welcomed!
 ---
 ## 📖 Citation
+If our work is useful for your research, please consider citing:
+```bibtex
 @misc{wang2025vptrackerglobalvisionlanguagetracking,
       title={VPTracker: Global Vision-Language Tracking via Visual Prompt and MLLM},
       author={Jingchao Wang and Kaiwen Zhou and Zhijian Wu and Kunhua Ji and Dingjiang Huang and Yefeng Zheng},