SkyReels-V2 Released: Open-Source Model Supporting Infinite-Length Video Generation
SkyworkAI team recently released a new video generation model, SkyReels-V2, a breakthrough open-source project capable of generating cinematic-quality videos of theoretically infinite length. The model employs an innovative “Diffusion Forcing” framework and supports both text-to-video (T2V) and image-to-video (I2V) generation methods.
Key Features
SkyReels-V2 brings multiple innovations to the field of video generation:
- Infinite-Length Video Generation: Using diffusion forcing technology, the model can generate videos of theoretically unlimited length
- Multi-Modal Input Support: Supports both text-to-video and image-to-video functionality
- High-Quality Visual Results: In human evaluations, its visual performance approaches closed-source commercial models like Kling-1.6 and Runway Gen-4
- Fully Open-Source and Commercial Use Friendly: Both code and model weights are open-sourced and available for commercial projects
- Video Captioning Model: Also includes SkyCaptioner-V1, a specialized model for video understanding
Model Series
SkyReels-V2 offers multiple model variants with different sizes and resolutions:
- Diffusion Forcing (DF) Models: Specifically designed for infinite-length video generation, available in 1.3B-540P and 14B-720P versions
- Text-to-Video (T2V) Models: Focused on generating high-quality videos from text prompts
- Image-to-Video (I2V) Models: Capable of generating coherent video sequences from input images
Technical Highlights
SkyReels-V2 employs several advanced technologies:
- Video Captioner (SkyCaptioner-V1): Fine-tuned from the Qwen2.5-VL-7B-Instruct model, significantly outperforming existing models in video content understanding
- Reinforcement Learning: Optimizes motion quality to address issues with large, deformable movements and physics compliance
- Diffusion Forcing: An innovative training and sampling strategy allowing independent noise levels for each token
- High-Quality Supervised Fine-Tuning: Enhances visual quality through a two-stage fine-tuning process
Performance
In human evaluations, SkyReels-V2 achieved excellent results in instruction adherence, consistency, and visual quality:
- In text-to-video tasks, SkyReels-V2 achieved an average score of 3.14, surpassing other open-source models including Wan2.1-14B
- In image-to-video tasks, SkyReels-V2-I2V achieved an average score of 3.29, approaching commercial closed-source model performance
Hardware Requirements
Note that SkyReels-V2 has relatively high hardware requirements:
- Generating 540P video with the 1.3B model requires approximately 14.7GB VRAM
- Generating 540P video with the 14B model requires approximately 43.4GB VRAM
- Long video generation or higher resolutions will require additional resources
Relevant Links
The release of SkyReels-V2 represents a significant advancement in AI video generation, particularly in long-form video synthesis, providing creators and developers with new possibilities. With the planned release of additional 5B series models and camera director models, we can expect further innovations from this technology in the future.