Skip to content
Help Build a Better ComfyUI Knowledge Base Become a Patron
NewsTencent Open Sources Speech-Driven Digital Human Model HunyuanVideo-Avatar: Generate Natural Digital Human Videos from a Single Image and Audio

Tencent Open Sources Speech-Driven Digital Human Model HunyuanVideo-Avatar

HunyuanVideo-Avatar

Tencent’s Hunyuan team has recently open-sourced the speech-driven digital human model HunyuanVideo-Avatar. This model can automatically generate natural and smooth digital human videos from a single portrait image and an audio clip, making the character in the image speak or sing. Whether for short video creation, e-commerce advertising, or virtual hosts, HunyuanVideo-Avatar provides a convenient digital human video generation experience for content creators and businesses.

Video Demonstrations

  • Multi-scene female solo:
  • Multi-scene dialogue example:
  • Multi-style character demonstration:

Key Features

  • Dynamic video generation from a single image and audio: Users only need to upload a portrait image and an audio clip. The model automatically understands the content and generates natural speaking or singing videos, including facial expressions, lip sync, and full-body movements.
  • High fidelity and dynamic performance: Supports high-quality, dynamic digital human videos, covering head, half-body, and full-body movements.
  • Multi-style, multi-species, and dual-person support: Not only supports real humans but can also generate dynamic videos in various artistic styles (such as anime, ink painting) and different species (such as robots, animals), supporting multi-character interaction.
  • Emotion transfer and control: Can extract emotional cues from reference images and transfer them to the generated video, enabling detailed emotional style control.
  • Character consistency: Through the character image injection module, ensures high consistency and natural dynamics of the character in the generated video.
  • Facial-aware audio adaptation: In multi-character scenarios, uses a facial-aware audio adapter for independent audio driving, supporting multi-character dialogues.

Application Scenarios

  • E-commerce live streaming: Digital human hosts introduce products, enhancing interactive experiences.
  • Online streaming: Virtual hosts and virtual idol content creation.
  • Social media videos: Individuals and creators can easily make engaging digital human short videos.
  • Content creation and editing: Provides dynamic video generation tools for fields like anime and games.
  • Cultural heritage activation: Brings historical figures and artifacts to life as digital humans.