Skip to content
Help Build a Better ComfyUI Knowledge Base Become a Patron
Tutorial SeriesComfyUI Advanced Tutorialvideowan2.1Wan2.1 T2V & I2V

Wan2.1 ComfyUI Workflow

The Wan2.1 model, open-sourced by Alibaba in February 2025, is a benchmark model in the field of video generation. It is licensed under the Apache 2.0 license and offers two versions: 14B (14 billion parameters) and 1.3B (1.3 billion parameters), covering various tasks including text-to-video (T2V) and image-to-video (I2V).

Additionally, community authors have created GGUF and quantized versions:

This article will guide you through the corresponding workflows related to Wan2.1, including:

  • The native Wan2.1 workflow supported by ComfyUI
  • The version from Kijai
  • The GGUF version from City96
๐Ÿ’ก

All workflow files used in this tutorial contain the corresponding workflow information, which can be directly dragged into ComfyUI to load the corresponding workflow and model information. After the pop-up, click to download the corresponding model. If the model download cannot be completed, please refer to the manual installation section to complete the model installation. All output videos will be saved to the ComfyUI/output directory. Since Wan2.1 separates the models for 480P and 720P, the corresponding workflows do not differ except for the model and canvas size. You can adjust the other version of the workflow based on the corresponding 720P or 480P workflow.

Example of Wan2.1 ComfyUI Native Workflow

The following workflow comes from the official ComfyUI blog. Currently, ComfyUI natively supports Wan2.1. To use the officially supported version, please upgrade your ComfyUI to the latest version. Refer to the section on how to upgrade ComfyUI for guidance. The ComfyUI Wiki has organized the original workflows.

After updating ComfyUI to the latest version, you can find the Wan2.1 workflow template in the menu bar under Workflows -> Workflow Templates.

Wan2.1 Workflow Template

All corresponding workflow files for this version come from Comfy-Org/Wan_2.1_ComfyUI_repackaged.

Among them, Diffusion models provides multiple versions. If the model version used in this articleโ€™s official native version has high hardware requirements, you can choose the version that suits your needs.

  • i2v stands for image to video model, and t2v stands for text to video model.
  • 14B and 1.3B represent the corresponding parameter amounts; larger values require higher hardware performance.
  • bf16, fp16, and fp8 represent different precisions; higher precision requires higher hardware performance.
    • bf16 may require support from Ampere architecture or higher GPUs.
    • fp16 is more widely supported.
    • fp8 has the lowest precision and hardware requirements, but the effect may also be relatively poorer.
  • Generally, larger file sizes require higher hardware specifications.

1. Wan2.1 Text-to-Video Workflow

1.1 Download Wan2.1 Text-to-Video Workflow File

Download the image below and drag it into ComfyUI, or use the menu bar Workflows -> Open(Ctrl+O) to load the workflow.

Wan2.1 Text-to-Video Workflow

Download the JSON format file.

1.2 Manual Model Installation

If the above workflow file cannot complete the model download, please download the model files below and save them to the corresponding locations.

๐Ÿ’ก

Different types of models have multiple files; please download one. The ComfyUI Wiki has already sorted them in order of GPU performance requirements from high to low. You can visit here to view all model files.

Select one Diffusion models file to download:

Select one version from Text encoders to download:

VAE

File save location

ComfyUI/
โ”œโ”€โ”€ models/
โ”‚   โ”œโ”€โ”€ diffusion_models/
โ”‚   โ”‚   โ””โ”€โ”€ wan2.1_t2v_14B_fp16.safetensors              # Or the version you choose
โ”‚   โ”œโ”€โ”€ text_encoders/
โ”‚   โ”‚   โ””โ”€โ”€โ”€ umt5_xxl_fp8_e4m3fn_scaled.safetensors      # Or the version you choose
โ”‚   โ””โ”€โ”€ vae/
โ”‚       โ””โ”€โ”€  wan_2.1_vae.safetensors

1.3 Steps to Run the Workflow

ComfyUI Wan2.1 Workflow Steps

  1. Ensure that the Load Diffusion Model node has loaded the wan2.1_t2v_1.3B_fp16.safetensors model.
  2. Ensure that the Load CLIP node has loaded the umt5_xxl_fp8_e4m3fn_scaled.safetensors model.
  3. Ensure that the Load VAE node has loaded the wan_2.1_vae.safetensors model.
  4. You can enter the video description content you want to generate in the CLIP Text Encoder node.
  5. Click the Queue button, or use the shortcut Ctrl(cmd) + Enter to execute the video generation.

2. Wan2.1 Image-to-Video Workflow

2.1 Wan2.1 Image-to-Video Workflow 14B Workflow

Workflow File Download Please click the button below to download the corresponding workflow, then drag it into the ComfyUI interface or use the menu bar Workflows -> Open(Ctrl+O) to load it.

Wan2.1 Image-to-Video Workflow 14B 480P Workflow

Download the JSON format file.

This version of the workflow is basically the same as the 480P version, except that it uses a different diffusion model and has different dimensions for the WanImageToVideo node.

Download the image below as the input image. Wan2.1 Image-to-Video Workflow 14B 480P Workflow Input Image Example

2.2 Manual Model Download

If the above workflow file cannot complete the model download, please download the model files below and save them to the corresponding locations.

Diffusion models

720P version

480P version

Text encoders

VAE

CLIP Vision

File save location

ComfyUI/
โ”œโ”€โ”€ models/
โ”‚   โ”œโ”€โ”€ diffusion_models/
โ”‚   โ”‚   โ””โ”€โ”€ wan2.1_i2v_480p_14B_fp16.safetensors         # Or the version you choose
โ”‚   โ”œโ”€โ”€ text_encoders/
โ”‚   โ”‚   โ””โ”€โ”€โ”€ umt5_xxl_fp8_e4m3fn_scaled.safetensors      # Or the version you choose
โ”‚   โ””โ”€โ”€ vae/
โ”‚   โ”‚   โ””โ”€โ”€  wan_2.1_vae.safetensors
โ”‚   โ””โ”€โ”€ clip_vision/
โ”‚       โ””โ”€โ”€  clip_vision_h.safetensors

2.3 Steps to Run the Workflow

ComfyUI Wan2.1 Workflow Steps

  1. Ensure that the Load Diffusion Model node has loaded the wan2.1_i2v_480p_14B_fp16.safetensors model
  2. Ensure that the Load CLIP node has loaded the umt5_xxl_fp8_e4m3fn_scaled.safetensors model
  3. Ensure that the Load VAE node has loaded the wan_2.1_vae.safetensors model
  4. Ensure that the Load CLIP Vision node has loaded the clip_vision_h.safetensors model
  5. Load the input image in the Load Image node
  6. Input the content you want to generate in the CLIP Text Encoder node, or use the example in the workflow
  7. Click the Queue button, or use the shortcut Ctrl(cmd) + Enter(Enter) to execute the video generation

Kijai Wan2.1 Quantized Version Workflow

This version is provided by Kijai, and requires the following custom nodes:

You need to install the following three nodes:

Please install the corresponding three custom nodes before starting. You can refer to ComfyUI Custom Node Installation Tutorial for guidance.

Model repository: Kijai/WanVideo_comfy

The repository provides multiple versions of models, please select the appropriate model based on your device performance. Generally, larger files have better effects but also require higher hardware performance.

๐Ÿ’ก

If the ComfyUI native workflow runs well on your device, you can also use the model provided by Comfy Org, and I will use the model provided by Kijai to complete the example in the example.

1. Kijai Text-to-Video Workflow

1.1 Kijai Wan2.1 Text-to-Video Workflow Download

Please click the button below to download the corresponding workflow, then drag it into the ComfyUI interface or use the menu bar Workflows -> Open(Ctrl+O) to load it.

The two workflow files are basically the same, but the second file has optional notes.

1.2 ๆ‰‹ๅŠจๆจกๅž‹ๅฎ‰่ฃ…

๐Ÿ’ก

Visit: https://huggingface.co/Kijai/WanVideo_comfy/tree/main to view the file size. Generally, larger files have better effects but also require higher hardware performance.

Diffusion models

Text encoders

VAE

File save location

ComfyUI/
โ”œโ”€โ”€ models/
โ”‚   โ”œโ”€โ”€ diffusion_models/
โ”‚   โ”‚   โ””โ”€โ”€ Wan2_1-T2V-14B_fp8_e4m3fn.safetensors             # Or the version you choose
โ”‚   โ”œโ”€โ”€ text_encoders/
โ”‚   โ”‚   โ””โ”€โ”€โ”€ umt5-xxl-enc-bf16.safetensors                    # Or the version you choose
โ”‚   โ””โ”€โ”€โ”€ vae/
โ”‚       โ””โ”€โ”€  Wan2_1_VAE_bf16.safetensors                      # Or the version you choose

1.3 Steps to Run the Workflow

Wan2.1 Text-to-Video Workflow Steps

Ensure that the corresponding node has loaded the corresponding model, use the version you downloaded.

  1. Ensure the WanVideo Vae Loader node has loaded the Wan2_1_VAE_bf16.safetensors model
  2. Ensure the WanVideo Model Loader node has loaded the Wan2_1-T2V-14B_fp8_e4m3fn.safetensors model
  3. Ensure the Load WanVideo T5 TextEncoder node has loaded the umt5-xxl-enc-bf16.safetensors model
  4. Input the content you want to generate in the WanVideo TextEncode node
  5. Click the Queue button, or use the shortcut Ctrl(cmd) + Enter(Enter) to execute the video generation

You can modify the size in the WanVideo Empty Embeds node to modify the video size.

2. Kiai Wan2.1 Image-to-Video Workflow

2.1 Workflow File Download

Download the image below as the input image ComfyUI wan2.1 Image-to-Video Workflow Input Image

2.2 Manual Model Download

๐Ÿ’ก

Using the model in the example of the ComfyUI Native part is also possible, it seems that only the text_encoder cannot be used.

Diffusion models 720P version

480P version

Text encoders

VAE

CLIP Vision

File save location

ComfyUI/
โ”œโ”€โ”€ models/
โ”‚   โ”œโ”€โ”€ diffusion_models/
โ”‚   โ”‚   โ””โ”€โ”€ Wan2_1-I2V-14B-720P_fp8_e4m3fn.safetensors           # Or the version you choose
โ”‚   โ”œโ”€โ”€ text_encoders/
โ”‚   โ”‚   โ””โ”€โ”€โ”€ umt5-xxl-enc-bf16.safetensors                       # Or the version you choose
โ”‚   โ”œโ”€โ”€ vae/
โ”‚   โ”‚   โ””โ”€โ”€  Wan2_1_VAE_fp32.safetensors                         # Or the version you choose
โ”‚   โ””โ”€โ”€ clip_vision/
โ”‚       โ””โ”€โ”€  clip_vision_h.safetensors 

2.3 Steps to Run the Workflow

Wan2.1 Quantized Version Image-to-Video 480P Workflow Diagram

Please refer to the number in the picture to ensure that the corresponding node and model are loaded to ensure that the model can run normally

  1. Ensure the WanVideo Model Loader node has loaded the Wan2_1-I2V-14B-720P_fp8_e4m3fn.safetensors model
  2. Ensure the Load WanVideo T5 TextEncoder node has loaded the umt5-xxl-enc-bf16.safetensors model
  3. Ensure the WanVideo Vae Loader node has loaded the Wan2_1_VAE_fp32.safetensors model
  4. Ensure the Load CLIP Vision node has loaded the clip_vision_h.safetensors model
  5. Load the input image in the Load Image node
  6. Save the default or modify the WanVideo TextEncode prompt to adjust the video effect
  7. Click the Queue button, or use the shortcut Ctrl(cmd) + Enter(Enter) to execute the video generation

Wan2.1 GGUF Version Workflow

This part will use the GGUF version model to complete the video generation Model repository: https://huggingface.co/city96/Wan2.1-T2V-14B-gguf/tree/main

We need ComfyUI-GGUF to load the corresponding model, please install the corresponding custom nodes before starting, you can refer to ComfyUI Custom Node Installation Tutorial for guidance.

๐Ÿ’ก

This version workflow is basically the same as the ComfyUI Native version workflow, but we use the GGUF version and the corresponding GGUF model loading to complete the video generation, I will still provide a complete model list in this part to prevent some users from directly viewing the example of this part.

1. Wan2.1 GGUF Version Text-to-Video Workflow

1.1 Workflow File Download

Wan2.1 GGUF Version Text-to-Video Workflow

1.2 Manual Model Download

Select a Diffusion models model file to download from the following list, city96 provides multiple different versions of models, please visit https://huggingface.co/city96/Wan2.1-T2V-14B-gguf/tree/main to download a suitable version for you, generally the larger the file, the better the effect, but the higher the requirements for device performance.

Select a version from Text encoders to download,

VAE

File save location

ComfyUI/
โ”œโ”€โ”€ models/
โ”‚   โ”œโ”€โ”€ diffusion_models/
โ”‚   โ”‚   โ””โ”€โ”€ wan2.1-t2v-14b-Q4_K_M.gguf                   # Or the version you choose
โ”‚   โ”œโ”€โ”€ text_encoders/
โ”‚   โ”‚   โ””โ”€โ”€โ”€ umt5_xxl_fp8_e4m3fn_scaled.safetensors      # Or the version you choose
โ”‚   โ””โ”€โ”€ vae/
โ”‚       โ””โ”€โ”€  wan_2.1_vae.safetensors

1.3 Steps to Run the Workflow

Wan2.1 GGUF Version Text-to-Video Workflow

  1. ็กฎไฟ Unet Loader(GGUF) ่Š‚็‚นๅŠ ่ฝฝไบ† wan2.1-t2v-14b-Q4_K_M.gguf ๆจกๅž‹
  2. ็กฎไฟLoad CLIP่Š‚็‚นๅŠ ่ฝฝไบ† umt5_xxl_fp8_e4m3fn_scaled.safetensors ๆจกๅž‹
  3. ็กฎไฟLoad VAE่Š‚็‚นๅŠ ่ฝฝไบ† wan_2.1_vae.safetensors ๆจกๅž‹
  4. ๅฏไปฅๅœจCLIP Text Encoder่Š‚็‚นไธญ่พ“ๅ…ฅไฝ ๆƒณ่ฆ็”Ÿๆˆ็š„่ง†้ข‘ๆ่ฟฐๅ†…ๅฎน
  5. ็‚นๅ‡ป Queue ๆŒ‰้’ฎ๏ผŒๆˆ–่€…ไฝฟ็”จๅฟซๆท้”ฎ Ctrl(cmd) + Enter(ๅ›ž่ฝฆ) ๆฅๆ‰ง่กŒ่ง†้ข‘็”Ÿๆˆ

2. Wan2.1 GGUF Version Image-to-Video Workflow

2.1 Workflow File Download

Wan2.1 GGUF 720P Image-to-Video Workflow

2.2 Manual Model Download

Select a Diffusion models model file to download from the following list, city96 provides multiple different versions of models, please visit the corresponding repository to download a suitable version for you, generally the larger the file, the better the effect, but the higher the requirements for device performance.

Here I use the wan2.1-i2v-14b-Q4_K_M.gguf model to complete the example

Select a version from Text encoders to download,

VAE

File save location

ComfyUI/
โ”œโ”€โ”€ models/
โ”‚   โ”œโ”€โ”€ diffusion_models/
โ”‚   โ”‚   โ””โ”€โ”€ wan2.1-i2v-14b-Q4_K_M.gguf                   # Or the version you choose
โ”‚   โ”œโ”€โ”€ text_encoders/
โ”‚   โ”‚   โ””โ”€โ”€โ”€ umt5_xxl_fp8_e4m3fn_scaled.safetensors      # Or the version you choose
โ”‚   โ””โ”€โ”€ vae/
โ”‚       โ””โ”€โ”€  wan_2.1_vae.safetensors

2.3 Steps to Run the Workflow

Wan2.1 GGUF Version Image-to-Video Workflow

  1. Ensure the Unet Loader(GGUF) node has loaded the wan2.1-i2v-14b-Q4_K_M.gguf model
  2. Ensure the Load CLIP node has loaded the umt5_xxl_fp8_e4m3fn_scaled.safetensors model
  3. Ensure the Load VAE node has loaded the wan_2.1_vae.safetensors model
  4. Ensure the Load CLIP Vision node has loaded the clip_vision_h.safetensors model
  5. Load the input image in the Load Image node
  6. Input the content you want to generate in the CLIP Text Encoder node, or use the example in the workflow
  7. Click the Queue button, or use the shortcut Ctrl(cmd) + Enter(Enter) to execute the video generation

Frequently Asked Questions

How to save as mp4 format video

The video generation workflow above defaults to generating videos in .webp format. If you want to save in other video formats, you can try using the video Combine node in the ComfyUI-VideoHelperSuite plugin to save as mp4 format video. Video Output Format

All models are now available for download on Hugging Face and ModelScope: