Alibaba Releases Wan-Animate Model - Unified Character Animation and Replacement Technology

Alibaba Tongyi Lab today officially released Wan-Animate, a unified character animation framework based on Wan2.2. The model can accurately replicate a character’s expressions and movements from a reference video to generate high-fidelity character videos, while also supporting seamless integration of animated characters into reference videos to replace original characters.

Demo Videos

Demo

Core Features

Wan-Animate provides two main functional modes:

Animation Mode: Given a character image and reference video, the model can animate the character by precisely replicating the expressions and movements in the video, generating high-quality character videos.

Replacement Mode: Integrates animated characters into reference videos to replace original characters, while replicating the scene’s lighting and color tone for seamless environmental integration.

Technical Innovations

Unified Input Framework

Wan-Animate is built upon the Wan-I2V model, employing a modified input paradigm to distinguish between reference conditions and generation regions. This design unifies reference image injection, temporal frame guidance, and mode selection into a universal symbolic representation, effectively reducing distribution shift during training.

Holistic Control Strategy

The model decomposes control signals into two parts: body movements and facial expressions:

Body Control: Uses skeleton-based representation, injected into initial noise latent variables through spatial alignment
Facial Control: Directly uses raw facial images from the reference video as driving signals, encoded as latent vectors to separate expression information from identity attributes

Environmental Lighting Adaptation

To enhance environmental consistency during character replacement, the team developed an auxiliary relighting LoRA module. This module applies appropriate environmental lighting and color tones while maintaining character appearance consistency, achieving more natural scene fusion effects.

Performance

Experimental results show that Wan-Animate achieves state-of-the-art performance across multiple evaluation dimensions:

Surpasses existing open-source character animation frameworks on quantitative metrics such as SSIM, LPIPS, and FVD
Performs excellently against commercial solutions like Runway Act-two and Bytedance DreamActor-M1 in human evaluations
Supports arbitrary output resolutions, maintaining the same aspect ratio as reference videos in replacement mode

Application Scenarios

Wan-Animate has broad application potential across multiple fields:

Film & TV Production: Recreating classic performance scenes, achieving cross-style character transformations
Advertising Creativity: Character replacement and commercial photography editing
Short Video Content: Dance movement replication and dynamic camera motion generation
Digital Avatars: Personalized character animation creation

Technical Specifications

The current version supports the following input specifications:

Video files: Less than 200MB, minimum side resolution greater than 200 pixels, maximum side less than 2048 pixels
Video duration: 2-30 seconds, aspect ratio 1:3 to 3:1
Image files: Less than 5MB, supporting jpg, png, jpeg, webp, bmp formats

With the release of Wan-Animate, Alibaba brings a powerful and easy-to-use open-source tool to the character animation field, promising to further advance and popularize related technologies.

ByteDance Releases Sa2VA: First Unified Image-Video Understanding Model