SkyReels-V2 Released: Open-Source Model Supporting Infinite-Length Video Generation

SkyworkAI team recently released a new video generation model, SkyReels-V2, a breakthrough open-source project capable of generating cinematic-quality videos of theoretically infinite length. The model employs an innovative "Diffusion Forcing" framework and supports both text-to-video (T2V) and image-to-video (I2V) generation methods.

Key Features

SkyReels-V2 brings multiple innovations to the field of video generation:

Infinite-Length Video Generation: Using diffusion forcing technology, the model can generate videos of theoretically unlimited length
Multi-Modal Input Support: Supports both text-to-video and image-to-video functionality
High-Quality Visual Results: In human evaluations, its visual performance approaches closed-source commercial models like Kling-1.6 and Runway Gen-4
Fully Open-Source and Commercial Use Friendly: Both code and model weights are open-sourced and available for commercial projects
Video Captioning Model: Also includes SkyCaptioner-V1, a specialized model for video understanding

Model Series

SkyReels-V2 offers multiple model variants with different sizes and resolutions:

Diffusion Forcing (DF) Models: Specifically designed for infinite-length video generation, available in 1.3B-540P and 14B-720P versions
Text-to-Video (T2V) Models: Focused on generating high-quality videos from text prompts
Image-to-Video (I2V) Models: Capable of generating coherent video sequences from input images

Technical Highlights

SkyReels-V2 employs several advanced technologies:

Video Captioner (SkyCaptioner-V1): Fine-tuned from the Qwen2.5-VL-7B-Instruct model, significantly outperforming existing models in video content understanding
Reinforcement Learning: Optimizes motion quality to address issues with large, deformable movements and physics compliance
Diffusion Forcing: An innovative training and sampling strategy allowing independent noise levels for each token
High-Quality Supervised Fine-Tuning: Enhances visual quality through a two-stage fine-tuning process

Performance

In human evaluations, SkyReels-V2 achieved excellent results in instruction adherence, consistency, and visual quality:

In text-to-video tasks, SkyReels-V2 achieved an average score of 3.14, surpassing other open-source models including Wan2.1-14B
In image-to-video tasks, SkyReels-V2-I2V achieved an average score of 3.29, approaching commercial closed-source model performance

Hardware Requirements

Note that SkyReels-V2 has relatively high hardware requirements:

Generating 540P video with the 1.3B model requires approximately 14.7GB VRAM
Generating 540P video with the 14B model requires approximately 43.4GB VRAM
Long video generation or higher resolutions will require additional resources

Relevant Links

The release of SkyReels-V2 represents a significant advancement in AI video generation, particularly in long-form video synthesis, providing creators and developers with new possibilities. With the planned release of additional 5B series models and camera director models, we can expect further innovations from this technology in the future.