--- license: apache-2.0 language: - en base_model: - tencent/HunyuanVideo pipeline_tag: text-to-video tags: - avatar - talking - audio --- # MoCha Demo Implementation [MoCha](https://congwei1230.github.io/MoCha/) is a pioneering model for **Dialogue-driven Movie Shot Generation**. | [**🌐Project Page**](https://congwei1230.github.io/MoCha/) | [**📖Paper**](https://arxiv.org/pdf/2503.23307) | [**🔗Github**](https://github.com/congwei1230/MoCha-Demo) | [**🤗Demo**](https://huggingface.co/datasets/CongWei1230/MoCha-Generation-on-MoChaBench-Visualizer)| This repository provides a demo implementation of MoCha Towards Movie-Grade Talking Character Synthesis. built on top of HunyuanVideo. We fine-tune HunyuanVideo on the Hallo3 dataset. Due to differences in training data, model scale, and training strategy, this demo does not fully reproduce the performance of the original MoCha model, but it reflects the core design and and serves as a baseline for further research and study. This implementation supports two generation modes: st2v: speech + text → video sti2v: image + speech + text → video Check out the [**🔗Github**](https://github.com/congwei1230/MoCha-Demo) for usage.