TurboDiffusion Releases Video Generation Acceleration Framework
The machine learning team at Tsinghua University recently open-sourced TurboDiffusion, a video generation acceleration framework that significantly improves the generation speed of video diffusion models while maintaining video quality.
Acceleration Performance
According to official tests, TurboDiffusion achieves 100 to 205 times end-to-end diffusion generation acceleration on a single RTX 5090 graphics card.
1.3B Model Acceleration Performance
On the 1.3B parameter Wan2.1 model, TurboDiffusion’s performance is particularly outstanding:
- Original model: End-to-end generation time approximately 166 seconds
- TurboDiffusion: Only 1.8 seconds required, achieving approximately 92x acceleration
This means video generation that originally took nearly 3 minutes can now be completed in less than 2 seconds.
14B Model Acceleration Performance (480p Resolution)
For larger 14B parameter models, the acceleration effect is equally significant:
- Original model: End-to-end generation time approximately 1635 seconds (over 27 minutes)
- FastVideo: Approximately 23.2 seconds
- TurboDiffusion: Only 9.4 seconds required, achieving approximately 174x acceleration compared to the original model
Compared to other acceleration solutions, TurboDiffusion still maintains a clear speed advantage, being about 2.5 times faster than FastVideo.
Video Quality Preservation
Importantly, despite the dramatic increase in generation speed, TurboDiffusion maintains video quality close to the original model. Official comparison demonstrations show that accelerated videos maintain consistency with originally generated videos in terms of image details, motion smoothness, and overall quality.
Technical Features
TurboDiffusion employs multiple optimization techniques to achieve acceleration, including Sparse Linear Attention (SLA) mechanism and SageAttention quantization technology. These techniques can significantly reduce computational load without notably affecting video quality, thereby improving generation speed.
The framework supports training and inference based on the Wan2.1 model and provides complete training code and infrastructure support, including FSDP2, Ulysses CP, and selective activation checkpoints.
Application Scenarios
This framework is primarily aimed at application scenarios requiring rapid video generation, helping users significantly reduce video generation time and improve work efficiency.
In practical applications, TurboDiffusion can significantly improve user experience in the following scenarios:
- Creative previewing: Quickly generate multiple versions for creative comparison and selection
- Real-time feedback: Obtain near real-time visual feedback when adjusting parameters
- Batch generation: Generate more video content in the same amount of time
- Resource-constrained environments: Achieve efficient video generation even on single-card devices
Additionally, the framework maintains video quality close to the original model, making it suitable for users who require high generation quality.
Open Source Information
TurboDiffusion is open-sourced under the Apache-2.0 license, with code and documentation publicly available on GitHub. The development team states they are actively developing more features, including optimizing parallel computing, integrating vLLM-Omni, and supporting more video generation models.
View Demonstration Effects
TurboDiffusion provides multiple real generation case comparison demonstrations in the GitHub repository, including test results for different scenarios and different model scales. These demonstrations intuitively show time comparisons before and after acceleration and video quality comparisons. Users can view complete demonstration effects on the project homepage.
Related Links
- GitHub Repository: https://github.com/thu-ml/TurboDiffusion
- Demo Video: https://github.com/thu-ml/TurboDiffusion#turbodiffusion
- Paper: TurboDiffusion: Accelerating Video Diffusion Models by 100—205 Times