Skip to content
Help Build a Better ComfyUI Knowledge Base Become a Patron

HunyuanVideo Text-to-Video Workflow Guide and Examples

This tutorial will provide detailed instructions on how to use Tencent’s Hunyuan Video model in ComfyUI for text-to-video generation. We’ll guide you through the process step by step, starting with environment setup.

1. Hardware Requirements

Before getting started, please ensure your system meets these minimum requirements:

  • GPU: NVIDIA GPU with CUDA support
    • Minimum: 60GB VRAM (for generating 720p×1280p×129 frames video)
    • Recommended: 80GB VRAM (for better generation quality)
    • Minimum usable: 45GB VRAM (for generating 544p×960p×129 frames video)
  • Operating System: Linux (official test environment)
  • CUDA Version: CUDA 11.8 or 12.0+ recommended

Hardware requirements source: https://huggingface.co/tencent/HunyuanVideo

1. Install and Update ComfyUI to Latest Version

If you haven’t installed ComfyUI yet, please refer to these sections:

ComfyUI Installation Guide ComfyUI Update Guide

You’ll need to install and update ComfyUI to the latest version to access the ‘EmptyHunyuanLatentVideo’ node.

2. Model Download and Installation

HunyuanVideo requires the following model files:

2.1 Main Model File

Download the following file from HunyuanVideo Main Model Download Page:

FilenameSizeDirectory
hunyuan_video_t2v_720p_bf16.safetensors~25.6GBComfyUI/models/diffusion_models

2.2 Text Encoder Files

Download the following files from HunyuanVideo Text Encoder Download Page:

FilenameSizeDirectory
clip_l.safetensors~246MBComfyUI/models/text_encoders
llava_llama3_fp8_scaled.safetensors~9.09GBComfyUI/models/text_encoders

2.3 VAE Model File

Download the following file from HunyuanVideo VAE Download Page:

FilenameSizeDirectory
hunyuan_video_vae_bf16.safetensors~493MBComfyUI/models/vae

Model Directory Structure Reference

ComfyUI/
├── models/
│   ├── diffusion_models/
│   │   └── hunyuan_video_t2v_720p_bf16.safetensors  # Main model file
│   ├── text_encoders/
│   │   ├── clip_l.safetensors                       # CLIP text encoder
│   │   └── llava_llama3_fp8_scaled.safetensors      # LLaVA text encoder
│   └── vae/
│       └── hunyuan_video_vae_bf16.safetensors       # VAE model file

3. Workflow File Download

Raw Json Format

Workflow file source: HunyuanVideo Workflow Download

Basic Video Generation Workflow

HunyuanVideo supports the following resolution settings:

Resolution9:16 Ratio16:9 Ratio4:3 Ratio3:4 Ratio1:1 Ratio
540p544×960×129f960×544×129f624×832×129f832×624×129f720×720×129f
720p (Recommended)720×1280×129f1280×720×129f1104×832×129f832×1104×129f960×960×129f

4. Workflow Node Explanation

4.1 Model Loading Nodes

  1. UNETLoader

    • Purpose: Load the main model file
    • Parameters:
      • Model: hunyuan_video_t2v_720p_bf16.safetensors
      • Weight Type: default (can choose fp8 type if memory is insufficient)
  2. DualCLIPLoader

    • Purpose: Load text encoder models
    • Parameters:
      • CLIP 1: clip_l.safetensors
      • CLIP 2: llava_llama3_fp8_scaled.safetensors
      • Text Encoder: hunyuan_video
  3. VAELoader

    • Purpose: Load VAE model
    • Parameters:
      • VAE Model: hunyuan_video_vae_bf16.safetensors

4.2 Key Video Generation Nodes

  1. EmptyHunyuanLatentVideo

    • Purpose: Create video latent space
    • Parameters:
      • Width: Video width (e.g., 848)
      • Height: Video height (e.g., 480)
      • Frame Count: Number of frames (e.g., 73)
      • Batch Size: Batch size (default 1)
  2. CLIPTextEncode

    • Purpose: Text prompt encoding
    • Parameters:
      • Text: Positive prompts (describe what you want to generate)
      • Recommended to use detailed English descriptions
  3. FluxGuidance

    • Purpose: Control generation guidance strength
    • Parameters:
      • Guidance Scale: Guidance strength (default 6.0)
      • Higher values make results closer to prompts but may affect video quality
  4. KSamplerSelect

    • Purpose: Select sampler
    • Parameters:
      • Sampler: Sampling method (default euler)
      • Other options: euler_ancestral, dpm++_2m, etc.
  5. BasicScheduler

    • Purpose: Set sampling scheduler
    • Parameters:
      • Scheduler: Scheduling method (default simple)
      • Steps: Sampling steps (recommended 20-30)
      • Denoise: Denoising strength (default 1.0)

4.3 Video Decoding and Saving Nodes

  1. VAEDecodeTiled

    • Purpose: Decode latent space video to actual video
    • Parameters:
      • Tile Size: 256 (can be reduced if memory is insufficient)
      • Overlap: 64 (can be reduced if memory is insufficient)

    Note: Prefer VAEDecodeTiled over VAEDecode as it’s more memory efficient

  2. SaveAnimatedWEBP

    • Purpose: Save generated video
    • Parameters:
      • Filename Prefix: File name prefix
      • FPS: Frame rate (default 24)
      • Lossless: Whether lossless (default false)
      • Quality: Quality (0-100, default 80)
      • Filter Type: Filter type (default default)

5. Parameter Optimization Tips

5.1 Memory Optimization

If encountering memory issues:

  1. Choose fp8 weight type in UNETLoader
  2. Reduce tile_size and overlap parameters in VAEDecodeTiled
  3. Use lower video resolution and frame count

5.2 Generation Quality Optimization

  1. Prompt Optimization

    [Subject Description], [Action Description], [Scene Description], [Style Description], [Quality Requirements]

    Example:

    anime style anime girl with massive fennec ears and one big fluffy tail, she has blonde hair long hair blue eyes wearing a pink sweater and a long blue skirt walking in a beautiful outdoor scenery with snow mountains in the background
  2. Parameter Adjustments

    • Increase sampling steps for better quality
    • Appropriately increase Guidance Scale for better text adherence
    • Adjust FPS and video quality parameters as needed

6. Common Issues

  1. Insufficient Memory

    • Refer to memory optimization section suggestions
    • Close other memory-consuming programs
    • Use lower video resolution settings
  2. Slow Generation Speed

    • This is normal, video generation takes time
    • Can reduce sampling steps and frame count
    • Use lower resolution to increase speed
  3. Quality Issues

    • Optimize prompt descriptions
    • Increase sampling steps
    • Adjust Guidance Scale
    • Try different samplers