Skip to content
Help Build a Better ComfyUI Knowledge Base Become a Patron

ComfyUI Frame Pack Workflow Complete Step-by-Step Tutorial

FramePack is an AI video generation technology developed by Dr. Lvmin Zhang’s team from Stanford University, the author of ControlNet. Its main features include:

  • Dynamic Context Compression: By classifying video frames based on importance, key frames retain 1536 feature markers, while transitional frames are simplified to 192.
  • Drift-resistant Sampling: Utilizing bidirectional memory methods and reverse generation techniques to avoid image drift and ensure action continuity.
  • Reduced VRAM Requirements: Lowering the VRAM threshold for video generation from professional-grade hardware (12GB+) to consumer-level (only 6GB VRAM), allowing ordinary users with an RTX 3060 laptop to generate high-quality videos up to 60 seconds long.
  • Open Source and Integration: FramePack is currently open-sourced and integrated into Tencent’s Hunyuan video model, supporting multimodal inputs (text + images + voice) and real-time interactive generation.

Corresponding Prompt

lllyasviel provides a GPT prompt for video generation in the corresponding repository. If you are unsure how to write prompts while using the Frame Pack workflow, you can try the following:

  1. Copy the prompt below and send it to GPT.
  2. Once GPT understands the requirements, provide it with the corresponding images, and you will receive the appropriate prompts.
You are an assistant that writes short, motion-focused prompts for animating images.

When the user sends an image, respond with a single, concise prompt describing visual motion (such as human activity, moving objects, or camera movements). Focus only on how the scene could come alive and become dynamic using brief phrases.

Larger and more dynamic motions (like dancing, jumping, running, etc.) are preferred over smaller or more subtle ones (like standing still, sitting, etc.).

Describe subject, then motion, then other things. For example: "The girl dances gracefully, with clear movements, full of charm."

If there is something that can dance (like a man, girl, robot, etc.), then prefer to describe it as dancing.

Stay in a loop: one image in, one motion prompt out. Do not explain, ask questions, or generate multiple options.

Current Implementation of Frame Pack in ComfyUI

Currently, there are three custom node authors who have implemented Frame Pack capabilities in ComfyUI:

Differences Between These Custom Nodes

Below we explain the differences in the workflows implemented by these custom nodes.

Kijai’s Custom Plugin

Kijai has repackaged the corresponding models, and I believe you have used Kijai’s related custom nodes, thanks to him for bringing such rapid updates!

It seems that Kijai’s version is not registered in the ComfyUI Manager, so it cannot currently be installed through the Manager’s Custom Nodes Manager. You need to install it via the Manager’s Git or manually.

Features:

  • Supports video generation with first and last frames
  • Requires installation via Git or manual installation
  • Models are reusable

HM-RunningHub and TTPlanetPig’s Custom Plugins

These two custom nodes are modified versions based on the same code, originally created by HM-RunningHub, and then TTPlanetPig implemented video generation with first and last frames based on the corresponding plugin source code. You can check this PR.

The folder structure of the models used by these two custom nodes is consistent, both using the original repository model files that have not been repackaged. Therefore, these model files cannot be used in other custom nodes that do not support this folder structure, leading to larger disk space usage.

Features:

  • Supports video generation with first and last frames
  • The downloaded model files may not be reusable in other nodes or workflows
  • Takes up more disk space because the model files are not repackaged
  • Some compatibility issues with dependencies
πŸ’‘

Additionally, I encountered errors during the operation, and I am currently unable to resolve these compatibility issues. Therefore, this article only provides relevant information. However, since first and last frames can also be implemented in Kijai’s version, these two plugins are only supplementary information. I would recommend using Kijai’s version as the first choice.

Kijai ComfyUI-FramePackWrapper FLF2V ComfyUI Workflow

1. Plugin Installation

For ComfyUI-FramePackWrapper, you may need to install it using the Manager’s Git:

Install via Git

Here are some articles you might find useful:

2. Workflow File Download

Download the video file below and drag it into ComfyUI to load the corresponding workflow. I have added the model information in the file, which will prompt you to download the model.

πŸ’‘

Due to recent updates in the ComfyUI frontend, please ensure that your frontend version is after 1.16.9, otherwise, this workflow may encounter widget loss issues after loading. For details, please visit: Widget disappears and cannot be set or adjusted after importing ComfyUI workflow

Video Preview

Download the images below, which we will use as image inputs.

Image

Image

3. Manual Model Installation

If you are unable to successfully download the models in the workflow, please download the models below and save them to the corresponding location.

CLIP Vision

VAE

Text Encoder

Diffusion Model Kijai provides two versions with different precisions. You can choose one to download based on your graphics card performance.

File NamePrecisionSizeDownload LinkGraphics Card Requirement
FramePackI2V_HY_bf16.safetensorsbf1625.7GBDownload LinkHigh
FramePackI2V_HY_fp8_e4m3fn.safetensorsfp816.3GBDownload LinkLow

File Save Location

πŸ“‚ ComfyUI/
β”œβ”€β”€πŸ“‚  models/
β”‚   β”œβ”€β”€πŸ“‚  diffusion_models/
β”‚   β”‚   └── FramePackI2V_HY_fp8_e4m3fn.safetensors  # or bf16 precision
β”‚   β”œβ”€β”€πŸ“‚  text_encoders/
β”‚   β”‚   β”œβ”€β”€β”€ clip_l.safetensors
β”‚   β”‚   └─── llava_llama3_fp16.safetensors
β”‚   β”œβ”€β”€πŸ“‚  clip_vision/
β”‚   β”‚   └── sigclip_vision_patch14_384.safetensors
β”‚   β””β”€β”€πŸ“‚  vae/
β”‚       └──  hunyuan_video_vae_bf16.safetensors

4. Complete the corresponding workflow step by step

Workflow Step Guide

  1. Ensure that the Load FramePackModel node has loaded the FramePackI2V_HY_fp8_e4m3fn.safetensors model.
  2. Ensure that the DualCLIPLoader node has loaded:
    • The clip_l.safetensors model
    • The llava_llama3_fp16.safetensors model
  3. Ensure that the Load CLIP Vision node has loaded the sigclip_vision_patch14_384.safetensors model.
  4. You can load the hunyuan_video_vae_bf16.safetensors model in the Load VAE node.
  5. (Optional, if using my input images) Modify the Prompt parameter in the CLIP Text Encoder node to input the video description you want to generate.
  6. In the Load Image node, load first_frame.jpg, which is related to the input processing of first_frame.
  7. In the Load Image node, load last_frame.jpg, which is related to the input processing of last_frame (if you do not need the last frame, you can delete it or use Bypass to disable it).
  8. In the FramePackSampler node, you can modify the total_second_length parameter to change the duration of the video; in my workflow, it is set to 5 seconds, and you can adjust it according to your needs.
  9. Click the Run button or use the shortcut Ctrl(cmd) + Enter to execute the video generation.

If you do not need the last frame, please bypass the entire input processing related to last_frame.

Workflow Step Guide

HM-RunningHub and TTPlanetPig’s Custom Plugins

These two plugins use the same model storage location, but as I mentioned earlier, they download the entire original repository, which needs to be saved in a specified location. This prevents other plugins from reusing these models, leading to some wasted disk space. However, they do implement the generation of first and last frames, so you can try them out if you want.

⚠️

While running the workflows of these two custom nodes, I encountered a list index out of range issue. You can check this issue. Currently, it has been discussed that the possible situation is:

β€œThe version of torchvision you are using is likely incompatible with the version of PyAV you have installed.”

However, after trying the methods mentioned in the issue, I still couldn’t resolve the problem. Therefore, I can only provide the relevant tutorial information here. If you manage to solve the issue, please feel free to provide feedback. I recommend checking this issue to see if anyone has proposed similar solutions.

Plugin Installation

  1. You can choose to install one of the following or both; the nodes differ, but they are both simple to use with only one node:
  1. Enhance the video editing experience in ComfyUI:

If you have used the VideoHelperSuite for video-related workflows, it is still crucial for expanding ComfyUI’s video capabilities.

1. Model Download

HM-RunningHub provides a Python script to download all the models. You just need to run this script and follow the prompts. My approach is to save the code below as download_models.py and place it in the root directory of ComfyUI/models, then run python download_models.py in the terminal from the corresponding directory.

cd <your installation path>/ComfyUI/models/

Then run the script:

python download_models.py

This requires that your Python independent environment / system environment has installed the huggingface_hub package.

from huggingface_hub import snapshot_download
 
# Download HunyuanVideo model
snapshot_download(
    repo_id="hunyuanvideo-community/HunyuanVideo",
    local_dir="HunyuanVideo",
    ignore_patterns=["transformer/*", "*.git*", "*.log*", "*.md"],
    local_dir_use_symlinks=False
)
 
# Download flux_redux_bfl model
snapshot_download(
    repo_id="lllyasviel/flux_redux_bfl",
    local_dir="flux_redux_bfl",
    ignore_patterns=["*.git*", "*.log*", "*.md"],
    local_dir_use_symlinks=False
)
 
# Download FramePackI2V_HY model
snapshot_download(
    repo_id="lllyasviel/FramePackI2V_HY",
    local_dir="FramePackI2V_HY",
    ignore_patterns=["*.git*", "*.log*", "*.md"],
    local_dir_use_symlinks=False
)
 

You can also manually download the models below and save them to the corresponding location, which means downloading all the files from the corresponding repository.

File Save Location

comfyui/models/
  flux_redux_bfl
  β”œβ”€β”€ feature_extractor
  β”‚Β Β  └── preprocessor_config.json
  β”œβ”€β”€ image_embedder
  β”‚Β Β  β”œβ”€β”€ config.json
  β”‚Β Β  └── diffusion_pytorch_model.safetensors
  β”œβ”€β”€ image_encoder
  β”‚Β Β  β”œβ”€β”€ config.json
  β”‚Β Β  └── model.safetensors
  β”œβ”€β”€ model_index.json
  └── README.md
  FramePackI2V_HY
  β”œβ”€β”€ config.json
  β”œβ”€β”€ diffusion_pytorch_model-00001-of-00003.safetensors
  β”œβ”€β”€ diffusion_pytorch_model-00002-of-00003.safetensors
  β”œβ”€β”€ diffusion_pytorch_model-00003-of-00003.safetensors
  β”œβ”€β”€ diffusion_pytorch_model.safetensors.index.json
  └── README.md
  HunyuanVideo
  β”œβ”€β”€ config.json
  β”œβ”€β”€ model_index.json
  β”œβ”€β”€ README.md
  β”œβ”€β”€ scheduler
  β”‚Β Β  └── scheduler_config.json
  β”œβ”€β”€ text_encoder
  β”‚Β Β  β”œβ”€β”€ config.json
  β”‚Β Β  β”œβ”€β”€ model-00001-of-00004.safetensors
  β”‚Β Β  β”œβ”€β”€ model-00002-of-00004.safetensors
  β”‚Β Β  β”œβ”€β”€ model-00003-of-00004.safetensors
  β”‚Β Β  β”œβ”€β”€ model-00004-of-00004.safetensors
  β”‚Β Β  └── model.safetensors.index.json
  β”œβ”€β”€ text_encoder_2
  β”‚Β Β  β”œβ”€β”€ config.json
  β”‚Β Β  └── model.safetensors
  β”œβ”€β”€ tokenizer
  β”‚Β Β  β”œβ”€β”€ special_tokens_map.json
  β”‚Β Β  β”œβ”€β”€ tokenizer_config.json
  β”‚Β Β  └── tokenizer.json
  β”œβ”€β”€ tokenizer_2
  β”‚Β Β  β”œβ”€β”€ merges.txt
  β”‚Β Β  β”œβ”€β”€ special_tokens_map.json
  β”‚Β Β  β”œβ”€β”€ tokenizer_config.json
  β”‚Β Β  └── vocab.json
  └── vae
      β”œβ”€β”€ config.json
      └── diffusion_pytorch_model.safetensors

2. Download Workflow

HM-RunningHub

FramePack_regular.json

FramePack_endimage.json

TTPlanetPig

TTP_FramePack_Start_End_Image_example.png

TTP_FramePack_Start_End_Image_example.json