nielsr HF Staff commited on
Commit
e28d411
·
verified ·
1 Parent(s): 63c59a1

Improve model card: Add pipeline tag, library name, paper and code links

Browse files

This PR enhances the model card for the CodeV model by adding crucial metadata and expanding the content for better discoverability and user experience.

Key improvements include:
- Adding `pipeline_tag: image-text-to-text` to improve discoverability for vision-language models on the Hugging Face Hub.
- Adding `library_name: transformers` as the model is compatible with the Transformers library, which will enable the automated "How to use" widget.
- Updating the paper link to point to the Hugging Face Papers page (`https://huggingface.co/papers/2511.19661`) for easier access and consistency.
- Including a direct link to the GitHub repository (`https://github.com/RenlyH/CodeV`) for users to access the code.
- Expanding the model description with key information from the paper's abstract, providing a more comprehensive overview of CodeV's purpose and capabilities.

Please review and merge if these improvements are satisfactory!

Files changed (1) hide show
  1. README.md +10 -4
README.md CHANGED
@@ -1,14 +1,20 @@
1
  ---
2
- license: mit
 
3
  datasets:
4
  - RenlyH/CodeV-RL-Data
5
  language:
6
  - en
7
  - zh
 
8
  metrics:
9
  - accuracy
10
- base_model:
11
- - Qwen/Qwen2.5-VL-7B-Instruct
12
  ---
13
 
14
- The model CodeV is trained with TAPO described in [paper](https://arxiv.org/abs/2511.19661).
 
 
 
 
 
1
  ---
2
+ base_model:
3
+ - Qwen/Qwen2.5-VL-7B-Instruct
4
  datasets:
5
  - RenlyH/CodeV-RL-Data
6
  language:
7
  - en
8
  - zh
9
+ license: mit
10
  metrics:
11
  - accuracy
12
+ pipeline_tag: image-text-to-text
13
+ library_name: transformers
14
  ---
15
 
16
+ CodeV is a code-based visual agent trained with Tool-Aware Policy Optimization (TAPO) for faithful visual reasoning. This agentic vision-language model is designed to "think with images" by calling image operations, addressing unfaithful visual reasoning in prior models. CodeV achieves competitive accuracy and substantially increases faithful tool-use rates on visual search benchmarks, also demonstrating strong performance on multimodal reasoning and math benchmarks.
17
+
18
+ This model was presented in the paper [CodeV: Code with Images for Faithful Visual Reasoning via Tool-Aware Policy Optimization](https://huggingface.co/papers/2511.19661).
19
+
20
+ Code: https://github.com/RenlyH/CodeV