Alibaba’s Wan2.1 Video Generation Model Officially Open-Sourced
On February 25, 2025, Alibaba announced that its latest generation video generation model, Wan2.1, has been officially open-sourced, marking a significant milestone. This model not only outperforms existing open-source models in terms of performance but also significantly lowers the barrier to entry with its lightweight version requiring only 8GB of video memory.
Key Highlights
Wan2.1 has achieved significant technological breakthroughs in multiple areas:
1. Exceptional Performance and Low Resource Requirements
- Ranked first on the VBench leaderboard with a total score of 86.22%, surpassing models like Sora (84.28%) and Luma (83.61%)
- The lightweight T2V-1.3B version requires only 8.19GB of video memory, making it possible to run on consumer-grade graphics cards
- Supports the generation of 8K resolution videos with details reaching cinematic standards
2. Comprehensive Functionality Support
- Supports multiple tasks such as text-to-video (T2V), image-to-video (I2V), and video editing
- First to introduce bilingual (Chinese and English) text effect generation, supporting dynamic subtitles and artistic fonts
- Adds video-to-audio (V2A) functionality, achieving synchronized audio and video generation
3. Innovative Technical Architecture
- Trained using the linear noise trajectory Flow Matching paradigm
- The Wan-VAE encoder can handle videos of any length at 1080P resolution
- The 3D causal convolution module enhances physical simulation capabilities
Version Selection and Hardware Requirements
Wan2.1 offers two versions to cater to different scenarios:
-
Speed Edition (1.3B)
- Requires only 8.19GB of video memory
- Suitable for individual developers
- 5-second 480P video generation time is approximately 4 minutes
-
Professional Edition (14B)
- Supports 720P professional-level rendering
- Suitable for film and television industry applications
- Offers a richer set of special effects interfaces
Open-Source Resource Acquisition
All models are now available for download on the Hugging Face and ModelScope platforms:
- T2V-14B: Hugging Face | ModelScope
- I2V-14B-720P: Hugging Face | ModelScope
- T2V-1.3B: Hugging Face | ModelScope
Application Scenarios
The application scope of Wan2.1 is broad, primarily including:
Personal Creation
- Short video content generation
- Artistic creation assistance
- Image animation
Professional Production
- Film and television special effects production
- Advertising creative design
- Educational resource production
Industrial Applications
- Product demonstration animation
- Architectural visualization
- Industrial process visualization
Future Prospects
The open-sourcing of Wan2.1 will bring new opportunities to AI video creation. Especially with its low hardware requirements, more individual developers and small teams can participate in AI video generation practices. This will not only promote the spread of technology but also drive innovation in the entire industry.