ByteDance Releases OmniHuman: Next-Generation Human Animation Framework
ByteDance research team recently (February 3rd) released âOmniHuman-1â, a human animation generation framework. This research was published in the paper âOmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Modelsâ, showcasing the latest advances in human animation generation.
- Project Homepage: https://omnihuman-lab.github.io/
Key Features of OmniHuman
OmniHuman is an end-to-end multimodal conditional human video generation framework with the following features:
- Simplified Input Requirements: Only requires a single human image and motion signals (such as audio or video) to generate human animations
- Flexible Input Support: Can process images of any aspect ratio, including portraits, half-body, and full-body shots
- Diverse Driving Methods: Supports motion driving through text, audio, video, and other means
- Detail Performance: Excellent performance in details such as hand movements and lip synchronization
Technical Implementation
The research team adopted an innovative mixed condition training strategy:
- Uses DiT architecture as the foundation, integrating multiple driving signal processing capabilities
- Designs Omni-Conditions mechanism, fusing audio, pose, and other features
- Employs a multi-stage training method to balance different conditions
- Training dataset includes 18.7K hours of human-related data
Potential Applications
OmniHumanâs application scenarios include:
- Virtual host production
- Digital human performance
- Video content creation
- Remote meeting avatars
Current Status
Currently, OmniHuman is not available for download or service. The research team indicates they will provide more updates in the future.
Resource Links
- Project Homepage: https://omnihuman-lab.github.io/
- Paper Link: OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models