Skip to content
ComfyUI Wikiโ€ข
Help Build a Better ComfyUI Knowledge Base Become a Patron

Qwen-Image ComfyUI Native, GGUF, and Nunchaku Workflow Complete Usage Guide

Qwen-Image is an image generation foundation model developed by Alibabaโ€™s Tongyi Lab, featuring a 20B parameter MMDiT (Multimodal Diffusion Transformer) architecture, and is open-sourced under the Apache 2.0 license. The model demonstrates unique technical advantages in the field of image generation, particularly excelling in text rendering and image editing.

Core Features:

  • Multilingual Text Rendering Capability: The model can accurately generate images containing English, Chinese, Korean, Japanese, and multiple other languages, with clear and readable text that harmonizes with the image style
  • Rich Artistic Style Support: From realistic styles to artistic creations, from anime styles to modern design, the model can flexibly switch between different visual styles based on prompts
  • Precise Image Editing Functionality: Supports local modifications, style transformations, and content additions to existing images while maintaining overall visual consistency

Related Resources:

Loading...

Qwen-Image ComfyUI Native Workflow Guide

Three different models are used in the workflow attached to this document:

  1. Original Qwen-Image model fp8_e4m3fn
  2. 8-step accelerated version: Original Qwen-Image model fp8_e4m3fn using lightx2v 8-step LoRA
  3. Distilled version: Qwen-Image distilled model fp8_e4m3fn

VRAM Usage Reference GPU: RTX4090D 24GB

Model UsedVRAM UsageFirst GenerationSecond Generation
fp8_e4m3fn86%โ‰ˆ 94sโ‰ˆ 71s
fp8_e4m3fn using lightx2v 8-step LoRA86%โ‰ˆ 55sโ‰ˆ 34s
Distilled version fp8_e4m3fn86%โ‰ˆ 69sโ‰ˆ 36s

1. Workflow File

After updating ComfyUI, you can find the workflow file from the templates, or drag the workflow below into ComfyUI to load it Qwen-image Text-to-Image Workflow

Download Official JSON Format Workflow

Distilled Version

2. Model Download

Versions you can find in the ComfyOrg repository

  • Qwen-Image_bf16 (40.9 GB)
  • Qwen-Image_fp8 (20.4 GB)
  • Distilled version (non-official, only 15 steps)

All models can be found on Huggingface or ModelScope

Diffusion model

Qwen_image_distill

  • The original author of the distilled version recommends 15 steps cfg 1.0
  • Tests show that this distilled version performs well at 10 steps cfg 1.0; choose euler or res_multistep based on your desired image type

LoRA

Text encoder

VAE

Model storage location

๐Ÿ“‚ ComfyUI/
โ”œโ”€โ”€ ๐Ÿ“‚ models/
โ”‚   โ”œโ”€โ”€ ๐Ÿ“‚ diffusion_models/
โ”‚   โ”‚   โ”œโ”€โ”€ qwen_image_fp8_e4m3fn.safetensors
โ”‚   โ”‚   โ””โ”€โ”€ qwen_image_distill_full_fp8_e4m3fn.safetensors ## Distilled version
โ”‚   โ”œโ”€โ”€ ๐Ÿ“‚ loras/
โ”‚   โ”‚   โ””โ”€โ”€ Qwen-Image-Lightning-8steps-V1.0.safetensors   ## 8-step acceleration LoRA model
โ”‚   โ”œโ”€โ”€ ๐Ÿ“‚ vae/
โ”‚   โ”‚   โ””โ”€โ”€ qwen_image_vae.safetensors
โ”‚   โ””โ”€โ”€ ๐Ÿ“‚ text_encoders/
โ”‚       โ””โ”€โ”€ qwen_2.5_vl_7b_fp8_scaled.safetensors

3. Complete the Workflow Step by Step

Step Diagram

  1. Ensure the Load Diffusion Model node loads qwen_image_fp8_e4m3fn.safetensors
  2. Ensure the Load CLIP node loads qwen_2.5_vl_7b_fp8_scaled.safetensors
  3. Ensure the Load VAE node loads qwen_image_vae.safetensors
  4. Ensure the image dimensions are set in the EmptySD3LatentImage node
  5. Set the prompts in the CLIP Text Encoder node; currently tested to support at least: English, Chinese, Korean, Japanese, Italian, etc.
  6. To enable the lightx2v 8-step acceleration LoRA, select it and use Ctrl + B to enable the node, and modify the Ksampler settings according to the parameters at position 8
  7. Click the Queue button, or use the shortcut Ctrl(cmd) + Enter to run the workflow
  8. Parameter settings for KSampler corresponding to different versions of models and workflows

The distilled version model and lightx2v 8-step acceleration LoRA seem unable to be used simultaneously; you can test specific combination parameters to verify if combination usage is feasible

Qwen-Image GGUF Version ComfyUI Workflow

The GGUF version is more friendly for users with low VRAM, and in certain weight configurations, you only need about 8GB of VRAM to run Qwen-Image

VRAM Usage Reference:

WorkflowVRAM UsageFirst GenerationSubsequent Generation
qwen-image-Q4_K_S.gguf56%โ‰ˆ 135sโ‰ˆ 77s
With 8steps LoRA56%โ‰ˆ 100sโ‰ˆ 45s

Model address: Qwen-Image-gguf

1. Update or Install Custom Nodes

Using the GGUF version requires you to install or update the ComfyUI-GGUF plugin

Please refer to How to Install ComfyUI Custom Nodes, or search and install through Manager

2. Workflow Download

Qwen-Image GGUF Workflow

3. Model Download

The GGUF version uses only the diffusion model differently from others

Please visit https://huggingface.co/city96/Qwen-Image-gguf to download any weight; typically, larger file sizes mean better quality but also require more VRAM. In this tutorial, I will use the following version:

๐Ÿ“‚ ComfyUI/
โ”œโ”€โ”€ ๐Ÿ“‚ models/
โ”‚   โ”œโ”€โ”€ ๐Ÿ“‚ diffusion_models/
โ”‚   โ”‚   โ””โ”€โ”€ qwen-image-Q4_K_S.gguf # Or any other version you choose

3. Complete the Workflow Step by Step

Step Diagram

  1. Ensure the Unet Loader(GGUF) node loads qwen-image-Q4_K_S.gguf or any other version you downloaded
    • Ensure ComfyUI-GGUF is installed and updated
  2. For LightX2V 8Steps LoRA, it is not enabled by default; you can select it and use Ctrl+B to enable the node
  3. If the 8-step LoRA is not enabled, the default steps are 20; if you enable the 8-step LoRA, please set it to 8
  4. Here is the reference for setting the corresponding steps
  5. Click the Queue button, or use the shortcut Ctrl(cmd) + Enter to run the workflow

Qwen-Image Nunchaku Version Workflow

Model address: nunchaku-qwen-image Custom node address: https://github.com/nunchaku-tech/ComfyUI-nunchaku

Nunchaku support pending