TurboDiffusion Releases Video Generation Acceleration Framework

The machine learning team at Tsinghua University recently open-sourced TurboDiffusion, a video generation acceleration framework that significantly improves the generation speed of video diffusion models while maintaining video quality.

Acceleration Performance

According to official tests, TurboDiffusion achieves 100 to 205 times end-to-end diffusion generation acceleration on a single RTX 5090 graphics card.

1.3B Model Acceleration Performance

On the 1.3B parameter Wan2.1 model, TurboDiffusion’s performance is particularly outstanding:

Original model: End-to-end generation time approximately 166 seconds
TurboDiffusion: Only 1.8 seconds required, achieving approximately 92x acceleration

This means video generation that originally took nearly 3 minutes can now be completed in less than 2 seconds.

14B Model Acceleration Performance (480p Resolution)

For larger 14B parameter models, the acceleration effect is equally significant:

Original model: End-to-end generation time approximately 1635 seconds (over 27 minutes)
FastVideo: Approximately 23.2 seconds
TurboDiffusion: Only 9.4 seconds required, achieving approximately 174x acceleration compared to the original model

Compared to other acceleration solutions, TurboDiffusion still maintains a clear speed advantage, being about 2.5 times faster than FastVideo.

Video Quality Preservation

Importantly, despite the dramatic increase in generation speed, TurboDiffusion maintains video quality close to the original model. Official comparison demonstrations show that accelerated videos maintain consistency with originally generated videos in terms of image details, motion smoothness, and overall quality.

Technical Features

TurboDiffusion employs multiple optimization techniques to achieve acceleration, including Sparse Linear Attention (SLA) mechanism and SageAttention quantization technology. These techniques can significantly reduce computational load without notably affecting video quality, thereby improving generation speed.

The framework supports training and inference based on the Wan2.1 model and provides complete training code and infrastructure support, including FSDP2, Ulysses CP, and selective activation checkpoints.

Application Scenarios

This framework is primarily aimed at application scenarios requiring rapid video generation, helping users significantly reduce video generation time and improve work efficiency.

In practical applications, TurboDiffusion can significantly improve user experience in the following scenarios:

Creative previewing: Quickly generate multiple versions for creative comparison and selection
Real-time feedback: Obtain near real-time visual feedback when adjusting parameters
Batch generation: Generate more video content in the same amount of time
Resource-constrained environments: Achieve efficient video generation even on single-card devices

Additionally, the framework maintains video quality close to the original model, making it suitable for users who require high generation quality.

Open Source Information

TurboDiffusion is open-sourced under the Apache-2.0 license, with code and documentation publicly available on GitHub. The development team states they are actively developing more features, including optimizing parallel computing, integrating vLLM-Omni, and supporting more video generation models.

View Demonstration Effects

TurboDiffusion provides multiple real generation case comparison demonstrations in the GitHub repository, including test results for different scenarios and different model scales. These demonstrations intuitively show time comparisons before and after acceleration and video quality comparisons. Users can view complete demonstration effects on the project homepage.

GitHub Repository: https://github.com/thu-ml/TurboDiffusion
Demo Video: https://github.com/thu-ml/TurboDiffusion#turbodiffusion
Paper: TurboDiffusion: Accelerating Video Diffusion Models by 100—205 Times

OpenMOSS Releases MOVA - Open-Source Synchronized Video and Audio Generation Model