ByteDance Releases OmniHuman: Next-Generation Human Animation Framework

ByteDance research team recently (February 3rd) released “OmniHuman-1”, a human animation generation framework. This research was published in the paper “OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models”, showcasing the latest advances in human animation generation.

Project Homepage: https://omnihuman-lab.github.io/

Key Features of OmniHuman

OmniHuman is an end-to-end multimodal conditional human video generation framework with the following features:

Simplified Input Requirements: Only requires a single human image and motion signals (such as audio or video) to generate human animations
Flexible Input Support: Can process images of any aspect ratio, including portraits, half-body, and full-body shots
Diverse Driving Methods: Supports motion driving through text, audio, video, and other means
Detail Performance: Excellent performance in details such as hand movements and lip synchronization

Technical Implementation

The research team adopted an innovative mixed condition training strategy:

Uses DiT architecture as the foundation, integrating multiple driving signal processing capabilities
Designs Omni-Conditions mechanism, fusing audio, pose, and other features
Employs a multi-stage training method to balance different conditions
Training dataset includes 18.7K hours of human-related data

Potential Applications

OmniHuman’s application scenarios include:

Virtual host production
Digital human performance
Video content creation
Remote meeting avatars

Current Status

Currently, OmniHuman is not available for download or service. The research team indicates they will provide more updates in the future.

Resource Links

Project Homepage: https://omnihuman-lab.github.io/
Paper Link: OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models

RunningHub

RunComfy

Comfy Deploy

Comfy Online

Comfy.ICU

InstaSD

优云智算

ComfyUI Subgraph Feature Now Officially Released