Skip to content
Help Build a Better ComfyUI Knowledge Base Become a Patron
NewsAlibaba's Tongyi Lab Releases VACE: Video Creation and Editing Enters Unified Era

Alibaba’s Tongyi Lab Releases VACE: Video Creation and Editing Enters Unified Era

April 2, 2025, Hangzhou — Alibaba Group’s Tongyi Lab officially released VACE (Video Creation and Editing Framework), the world’s first unified framework for diverse video tasks. This framework integrates multimodal technologies to achieve full coverage from text-to-video generation, video editing to complex task combinations, marking a significant advancement in AI video technology from isolated functions to intelligent end-to-end capabilities. VACE Teaser

Core Features: The “Swiss Army Knife” of Video

VACE integrates four core functionalities into a unified platform:

  • Text-to-Video (T2V): Generate dynamic videos with text descriptions alone. For example, “a cat playing in the grass” transforms into a vivid scene.
  • Reference-to-Video (R2V): Generate content based on images or video segments, ensuring precise incorporation of specified elements (such as specific characters or scenes).
  • Video-to-Video Editing (V2V): Support full-video style adjustments (like cyberpunk style conversions), color reconstruction, and dynamic element addition.
  • Masked Video-to-Video Editing (MV2V): Implement local repairs and frame expansion through spatio-temporal mask technology, seamlessly integrating modified areas with the original video.

Most notably, VACE supports free combination of these functionalities. For example, combining “reference image generation” with “mask editing” enables complex creations like object replacement and action transfer, breaking the boundaries of traditional tools.

Technical Breakthroughs: Three Innovative Engines

Video Condition Unit (VCU)

Pioneering a unified interface for multimodal inputs, converting heterogeneous data such as text, images, video, and masks into standardized input streams, solving the complexity issues of multiple model switching in traditional tools.

Concept Decoupling Strategy

Automatically separating elements such as characters, backgrounds, and actions in videos to enable targeted editing. For example, maintaining the scene while replacing the main character, avoiding logical breaks caused by traditional editing.

Context Adapter Architecture

An intelligent kernel reconstructed based on Diffusion Transformer (DiT), dynamically adjusting generation strategies according to task requirements. It focuses on details in repair tasks and optimizes overall atmosphere in stylization tasks.

Test data shows that 1080P videos generated by VACE have a 23% improvement in dynamic continuity metrics compared to similar products, and a 40% increase in editing efficiency in complex scenarios.

Application Scenarios: Reshaping Industry Productivity

  • Content Creation: Short video creators can quickly generate material frameworks through “text + reference images,” then refine their work through local editing.
  • Film and Television Industry: Automation of special effects production and flaw repair. Tests by a film company show a 60% reduction in post-production costs.
  • Social Platforms: Supports one-click generation of personalized animated content, already integrated into multiple social applications in the Alibaba ecosystem.
  • Education and Training: Teachers can generate instructional videos based on courseware text and images, and students can create interactive learning materials.

Strategic Layout: Milestone for AI To C

This release is an important implementation of Alibaba’s “AI To C” strategy. Since the Tongyi team was spun off from Alibaba Cloud and integrated into the Smart Information Business Group in late 2024, its productization process has significantly accelerated. The launch of VACE not only fills the gap in consumer-level video creation tools but also forms technical synergy with Tongyi Lab’s previously open-sourced ViDoRAG system (79.4% document understanding accuracy), building a multimodal AI ecological closed loop.

A Tongyi Lab representative stated: “VACE will serve as a super intelligent agent entry point, connecting to more Qianwen large model capabilities in the future, ultimately achieving a ‘think it, get it’ creative experience.” Currently, VACE has launched a preview version, with plans for full commercial availability in the third quarter of 2025.