Tencent Open Sources Speech-Driven Digital Human Model HunyuanVideo-Avatar

HunyuanVideo-Avatar

Tencent’s Hunyuan team has recently open-sourced the speech-driven digital human model HunyuanVideo-Avatar. This model can automatically generate natural and smooth digital human videos from a single portrait image and an audio clip, making the character in the image speak or sing. Whether for short video creation, e-commerce advertising, or virtual hosts, HunyuanVideo-Avatar provides a convenient digital human video generation experience for content creators and businesses.

Video Demonstrations

Multi-scene female solo:

Multi-scene dialogue example:

Multi-style character demonstration:

Key Features

Dynamic video generation from a single image and audio: Users only need to upload a portrait image and an audio clip. The model automatically understands the content and generates natural speaking or singing videos, including facial expressions, lip sync, and full-body movements.
High fidelity and dynamic performance: Supports high-quality, dynamic digital human videos, covering head, half-body, and full-body movements.
Multi-style, multi-species, and dual-person support: Not only supports real humans but can also generate dynamic videos in various artistic styles (such as anime, ink painting) and different species (such as robots, animals), supporting multi-character interaction.
Emotion transfer and control: Can extract emotional cues from reference images and transfer them to the generated video, enabling detailed emotional style control.
Character consistency: Through the character image injection module, ensures high consistency and natural dynamics of the character in the generated video.
Facial-aware audio adaptation: In multi-character scenarios, uses a facial-aware audio adapter for independent audio driving, supporting multi-character dialogues.

Application Scenarios

E-commerce live streaming: Digital human hosts introduce products, enhancing interactive experiences.
Online streaming: Virtual hosts and virtual idol content creation.
Social media videos: Individuals and creators can easily make engaging digital human short videos.
Content creation and editing: Provides dynamic video generation tools for fields like anime and games.
Cultural heritage activation: Brings historical figures and artifacts to life as digital humans.

ByteDance Releases Sa2VA: First Unified Image-Video Understanding Model

Tencent Open Sources Speech-Driven Digital Human Model HunyuanVideo-Avatar

Video Demonstrations

Key Features

Application Scenarios

Related Links