tokenizer_config.json里的chat_template与vllm里的不一致

#11
by shanlinguoke - opened

vllm命令:
vllm serve ERNIE-4.5-VL-28B-A3B-Thinking --trust-remote-code
--reasoning-parser ernie45
--tool-call-parser ernie45
--enable-auto-tool-choice --cpu-offload-gb 48

跟踪模型的输入,结果是:
<|begin_of_sentence|>You are a multimodal AI assistant called ERNIE developed by Baidu based on the PaddlePaddle framework.\nUser: <|IMAGE_START|><|image@placeholder|><|IMAGE_END|>\nFrom which era does the artifact in the image originate?\nAssistant: \n\n

而直接读取tokenizer_config.json里的chat_template,结果是:
<|begin_of_sentence|>You are a multimodal AI assistant called ERNIE developed by Baidu based on the PaddlePaddle framework.
User: Picture 1:<|IMAGE_START|><|image@placeholder|><|IMAGE_END|>From which era does the artifact in the image originate?
Assistant:

直接读取chat_template结果中多了Picture 1。同样,视频多了Video 1。

虽然以上两个输入,模型的输出几乎不受影响。

本着严谨的态度,询问一下哪个才是是正确的?

BAIDU org

第二种是正确的。

Sign up or log in to comment