baidu
/

ERNIE-4.5-VL-28B-A3B-Thinking

Image-Text-to-Text

ernie4_5_moe_vl

feature-extraction

Model card Files Files and versions

tokenizer_config.json里的chat_template与vllm里的不一致

#11

by shanlinguoke - opened 19 days ago

19 days ago

vllm命令：
vllm serve ERNIE-4.5-VL-28B-A3B-Thinking --trust-remote-code
--reasoning-parser ernie45
--tool-call-parser ernie45
--enable-auto-tool-choice --cpu-offload-gb 48

跟踪模型的输入，结果是：
<|begin_of_sentence|>You are a multimodal AI assistant called ERNIE developed by Baidu based on the PaddlePaddle framework.\nUser: <|IMAGE_START|><|image@placeholder|><|IMAGE_END|>\nFrom which era does the artifact in the image originate?\nAssistant: \n\n

而直接读取tokenizer_config.json里的chat_template，结果是：
<|begin_of_sentence|>You are a multimodal AI assistant called ERNIE developed by Baidu based on the PaddlePaddle framework.
User: Picture 1:<|IMAGE_START|><|image@placeholder|><|IMAGE_END|>From which era does the artifact in the image originate?
Assistant:

直接读取chat_template结果中多了Picture 1。同样，视频多了Video 1。

虽然以上两个输入，模型的输出几乎不受影响。

本着严谨的态度，询问一下哪个才是是正确的？

Yelrose

BAIDU org 14 days ago

第二种是正确的。

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment