Improve model card: Add pipeline tag, library name, paper and code links
Browse filesThis PR enhances the model card for the CodeV model by adding crucial metadata and expanding the content for better discoverability and user experience.
Key improvements include:
- Adding `pipeline_tag: image-text-to-text` to improve discoverability for vision-language models on the Hugging Face Hub.
- Adding `library_name: transformers` as the model is compatible with the Transformers library, which will enable the automated "How to use" widget.
- Updating the paper link to point to the Hugging Face Papers page (`https://huggingface.co/papers/2511.19661`) for easier access and consistency.
- Including a direct link to the GitHub repository (`https://github.com/RenlyH/CodeV`) for users to access the code.
- Expanding the model description with key information from the paper's abstract, providing a more comprehensive overview of CodeV's purpose and capabilities.
Please review and merge if these improvements are satisfactory!
|
@@ -1,14 +1,20 @@
|
|
| 1 |
---
|
| 2 |
-
|
|
|
|
| 3 |
datasets:
|
| 4 |
- RenlyH/CodeV-RL-Data
|
| 5 |
language:
|
| 6 |
- en
|
| 7 |
- zh
|
|
|
|
| 8 |
metrics:
|
| 9 |
- accuracy
|
| 10 |
-
|
| 11 |
-
|
| 12 |
---
|
| 13 |
|
| 14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
base_model:
|
| 3 |
+
- Qwen/Qwen2.5-VL-7B-Instruct
|
| 4 |
datasets:
|
| 5 |
- RenlyH/CodeV-RL-Data
|
| 6 |
language:
|
| 7 |
- en
|
| 8 |
- zh
|
| 9 |
+
license: mit
|
| 10 |
metrics:
|
| 11 |
- accuracy
|
| 12 |
+
pipeline_tag: image-text-to-text
|
| 13 |
+
library_name: transformers
|
| 14 |
---
|
| 15 |
|
| 16 |
+
CodeV is a code-based visual agent trained with Tool-Aware Policy Optimization (TAPO) for faithful visual reasoning. This agentic vision-language model is designed to "think with images" by calling image operations, addressing unfaithful visual reasoning in prior models. CodeV achieves competitive accuracy and substantially increases faithful tool-use rates on visual search benchmarks, also demonstrating strong performance on multimodal reasoning and math benchmarks.
|
| 17 |
+
|
| 18 |
+
This model was presented in the paper [CodeV: Code with Images for Faithful Visual Reasoning via Tool-Aware Policy Optimization](https://huggingface.co/papers/2511.19661).
|
| 19 |
+
|
| 20 |
+
Code: https://github.com/RenlyH/CodeV
|