PUSA V1.0: Low-Cost, High-Performance Video Generation Model Released
On July 16, 2025, PUSA V1.0 was officially released. Based on the latest Wan2.1-T2V-14B, this model introduces Vectorized Timestep Adaptation (VTA) technology, requiring only 1/2500 of the original dataset, 1/200 of the training cost, and 1/5 of the inference steps, yet surpasses the performance of Wan-I2V-14B.
What is PUSA V1.0?
PUSA V1.0 is an open-source AI model for video generation, featuring the new Vectorized Timestep Adaptation (VTA) technology. Unlike traditional video diffusion models that use a single timestep, PUSA enables more detailed noise control for each frame, resulting in higher generation quality and richer multi-task capabilities.
Key Features and Innovations
- Vectorized Timestep Adaptation (VTA): Breaks the limitation of scalar timesteps, enabling flexible frame-level control.
- Highly Efficient: Uses only 3,860 video samples, about $500 in training cost, and significantly fewer inference steps.
- Multi-Task Support: Supports image-to-video (I2V), keyframe generation, video completion, video extension, text-to-video (T2V), video transitions, and more.
- Non-Destructive Fine-Tuning: Adds new features via LoRA fine-tuning while retaining all original model capabilities, ensuring strong compatibility.
- Open Source: Model weights, training data, inference, and training code are fully open for community and industry research and application.
Comparison with Wan-I2V
PUSA V1.0 surpasses Wan-I2V-14B in performance with much lower training resources and data. Wan-I2V supports only image-to-video, while PUSA V1.0 unifies multiple tasks and scores higher in VBench-I2V evaluation (87.32% vs 86.86%).
Application Scenarios
- AI Creative Video Generation: Quickly generate high-quality short videos from an image or text.
- Video Completion and Extension: Complete or extend existing videos, including keyframe completion.
- Multi-Frame Keyframe Interpolation: Generate smooth video transitions from multiple keyframes.
- Education, Entertainment, Advertising: Provides efficient video generation tools for creators, educators, and advertisers.
Visual Demos
Below are some animated examples from PUSA V0.5. V1.0 further improves multi-task capabilities and generation quality:
The release of PUSA V1.0 makes video generation technology more accessible and efficient. Its innovative VTA method not only improves quality but also greatly lowers the barrier for development and application.
Related Links
- PUSA V1.0 Model and Introduction (Hugging Face)
- PUSA V1.0 Training Dataset (Hugging Face)
- Official Project Homepage
- Technical Report (PDF)
- arXiv Paper: 2410.03160