MoCha Demo Implementation

MoCha is a pioneering model for Dialogue-driven Movie Shot Generation.

This repository provides a demo implementation of MoCha Towards Movie-Grade Talking Character Synthesis. built on top of HunyuanVideo.

We fine-tune HunyuanVideo on the Hallo3 dataset. Due to differences in training data, model scale, and training strategy, this demo does not fully reproduce the performance of the original MoCha model, but it reflects the core design and and serves as a baseline for further research and study.

This implementation supports two generation modes:

st2v: speech + text → video

sti2v: image + speech + text → video

Check out the 🔗Github for usage.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for CongWei1230/MoCha-Demo

Base model

tencent/HunyuanVideo

Finetuned

(32)

this model

Collection including CongWei1230/MoCha-Demo

MoCha

Collection

Dialogue-driven Movie Shot Generation • 4 items • Updated 6 days ago • 1