Skip to content
Become a Patron Help Build a Better ComfyUI Knowledge Base
Tutorial SeriesComfyUI Advanced TutorialComfyUI Sonic Workflow for Digital Human Video Generation

ComfyUI Sonic Workflow for Digital Human Video Generation

Sonic is an open-source digital human model by Tencent that can generate impressive video output from just images and audio input.

Here are the original Sonic-related links: Project page: https://jixiaozhong.github.io/Sonic/ Online demo: http://demo.sonic.jixiaozhong.online/ Source code: https://github.com/jixiaozhong/Sonic

Recently, community members have completed the plugin integration. This tutorial is based on the ComfyUI_Sonic plugin to reproduce Sonic’s official example effects.

πŸ’‘

Currently, I’m still experiencing some issues running this workflow. I will update this tutorial with corresponding instructions once testing is complete.

1. ComfyUI Sonic Plugin Installation

This workflow depends on the following plugins. Please ensure you have completed the plugin and dependency installation, or install missing nodes using ComfyUI-manager after downloading the workflow:

ComfyUI_Sonic: https://github.com/smthemex/ComfyUI_Sonic ComfyUI-VideoHelperSuite: https://github.com/Kosinkadink/ComfyUI-VideoHelperSuite

If you’re unfamiliar with the installation process, please refer to the ComfyUI Plugin Installation Tutorial

2. Downloading and Installing Sonic Models

The plugin repository provides model downloads. If the following model links are invalid or inaccessible, please check the plugin author’s repository for updates.

Models should be saved in the following locations:

πŸ“ComfyUI
β”œβ”€β”€ πŸ“models
β”‚   β”œβ”€β”€ πŸ“checkpoints
β”‚   β”‚      └── πŸ“video                         //  video folder for model categorization (optional) 
β”‚   β”‚           └── svd_xt_1_1.safetensors     // svd_xt.safetensors or svd_xt_1_1.safetensors model file 
β”‚   └── πŸ“sonic                                // Create new sonic folder, save all content here from Google Drive
β”‚       β”œβ”€β”€ πŸ“ whisper-tiny                            
β”‚       β”‚   β”œβ”€β”€ config.json 
β”‚       β”‚   β”œβ”€β”€ model.safetensors
β”‚       β”‚   └── preprocessor_config.json
β”‚       β”œβ”€β”€ πŸ“ RIFE  
β”‚       β”‚   └── flownet.pkl
β”‚       β”œβ”€β”€ audio2bucket.pth
β”‚       β”œβ”€β”€ audio2token.pth
β”‚       β”œβ”€β”€ unet.pth
β”‚       └── yoloface_v5m.pt

2.1 Choose one of these Stable Video Diffusion models:

svd_xt_1_1.safetensors https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt-1-1/tree/main svd_xt_1_1.safetensors https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt-1-1/tree/main

Visit the following drive address and download all resources in the folder: Sonic models: https://drive.google.com/drive/folders/1oe8VTPUy0-MHHW2a_NJ1F8xL-0VN5G7W

2.3 Download whisper-tiny model

whisper-tiny https://huggingface.co/openai/whisper-tiny/tree/main

Download only these three files:

  • config.json
  • model.safetensors
  • preprocessor_config.json

ComfyUI Sonic Workflow Resources

Please download the following audio, photos, and workflow files, or use your own materials: Image: Sonic input anime

Audio, please download any sample audio from: https://github.com/smthemex/ComfyUI_Sonic/tree/main/examples/wav

ComfyUI Sonic Workflow Explanation

Sonic Workflow

  1. At position 1, load the stable video diffusion model like svd_xt_1_1.safetensors
  2. At position 2, upload and load the audio file
  3. At position 3, upload the sample image
  4. At position 4, load the unet.pth model file
  5. Use Queue or shortcut Ctrl(Command)+Enter to run the workflow for image generation

Troubleshooting

  1. Transformers version issue Since this plugin requires transformers==4.43.2, if your workflow doesn’t run properly, please modify:
πŸ“ComfyUI
β”œβ”€β”€ πŸ“custom_nodes
β”‚   └── πŸ“ComfyUI_Sonic           // Plugin directory
β”‚       └── requirements.txt      // Plugin requirements file

Please modify in requirements.txt from:

#transformers ==4.43.2

Remove #

transformers ==4.43.2

Then restart ComfyUI or use pip to install the dependency

  1. frame_rate type mismatch issue I encountered a numeric type mismatch in the last node. I tried using a primitive node as input Type mismatch

Additionally, as we’re still testing this workflow, if you have better solutions, please leave a comment. I will update this tutorial promptly.