ποΈ VoxMind
β¨ Overview
Recent end-to-end spoken dialogue models have made natural voice interaction increasingly practical. However, as user requests become more complex and task-oriented, conversational ability alone is often not enough. To address real-world spoken tasks, these models must be equipped with agentic capabilities such as structured reasoning, tool use, and dynamic access to external functions.
VoxMind is an integrated framework designed to equip end-to-end spoken dialogue models with comprehensive agentic abilities. Built around a Think-before-Speak paradigm, VoxMind enables the model to internalize structured reasoning before response generation, which improves planning, tool selection, and spoken answer quality. In addition, to alleviate the latency bottleneck introduced by large-scale tool integration, VoxMind includes a Multi-Agent Dynamic Tool Management architecture that asynchronously delegates tool retrieval to an auxiliary agent aligned with the main modelβs reasoning trajectory.
π§ͺ Minimal Usage Example
from runtime import DEFAULT_SYSTEM_PROMPT, VoxMind
model = VoxMind("/path/to/VoxMind")
tools = []
system_prompt = model.build_system_prompt(
DEFAULT_SYSTEM_PROMPT,
tools,
extra_context={"current_city": "Beijing", "user_language": "en"},
)
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": "What's the weather like in Beijing today?"},
]
response = model.generate(
messages,
post_think_prefix="After careful reasoning, here is my detailed answer:\n",
max_new_tokens=512,
temperature=0.6,
top_p=0.9,
do_sample=True,
)
print(response.think)
print(response.answer)
print(model.parse_tool_calls(response.answer))
π Citation
If this repository or its workflow design is helpful to your research, please cite or reference it appropriately.
@misc{liang2026voxmindendtoendagenticspoken,
title={VoxMind: An End-to-End Agentic Spoken Dialogue System},
author={Tianle Liang and Yifu Chen and Shengpeng Ji and Yijun Chen and Zhiyang Jia and Jingyu Lu and Fan Zhuo and Xueyi Pu and Yangzhuo Li and Zhou Zhao},
year={2026},
eprint={2604.15710},
archivePrefix={arXiv},
primaryClass={cs.SD},
url={https://arxiv.org/abs/2604.15710},
}
- Downloads last month
- 44
Model tree for leungtianle/VoxMind
Unable to build the model tree, the base model loops to the model itself. Learn more.