Qwen-Image ComfyUI Native, GGUF, and Nunchaku Workflow Complete Usage Guide
Qwen-Image is an image generation foundation model developed by Alibabaโs Tongyi Lab, featuring a 20B parameter MMDiT (Multimodal Diffusion Transformer) architecture, and is open-sourced under the Apache 2.0 license. The model demonstrates unique technical advantages in the field of image generation, particularly excelling in text rendering and image editing.
Core Features:
- Multilingual Text Rendering Capability: The model can accurately generate images containing English, Chinese, Korean, Japanese, and multiple other languages, with clear and readable text that harmonizes with the image style
- Rich Artistic Style Support: From realistic styles to artistic creations, from anime styles to modern design, the model can flexibly switch between different visual styles based on prompts
- Precise Image Editing Functionality: Supports local modifications, style transformations, and content additions to existing images while maintaining overall visual consistency
Related Resources:
Qwen-Image ComfyUI Native Workflow Guide
Three different models are used in the workflow attached to this document:
- Original Qwen-Image model fp8_e4m3fn
- 8-step accelerated version: Original Qwen-Image model fp8_e4m3fn using lightx2v 8-step LoRA
- Distilled version: Qwen-Image distilled model fp8_e4m3fn
VRAM Usage Reference GPU: RTX4090D 24GB
Model Used | VRAM Usage | First Generation | Second Generation |
---|---|---|---|
fp8_e4m3fn | 86% | โ 94s | โ 71s |
fp8_e4m3fn using lightx2v 8-step LoRA | 86% | โ 55s | โ 34s |
Distilled version fp8_e4m3fn | 86% | โ 69s | โ 36s |
1. Workflow File
After updating ComfyUI, you can find the workflow file from the templates, or drag the workflow below into ComfyUI to load it
Download Official JSON Format Workflow
Distilled Version
2. Model Download
Versions you can find in the ComfyOrg repository
- Qwen-Image_bf16 (40.9 GB)
- Qwen-Image_fp8 (20.4 GB)
- Distilled version (non-official, only 15 steps)
All models can be found on Huggingface or ModelScope
Diffusion model
Qwen_image_distill
- The original author of the distilled version recommends 15 steps cfg 1.0
- Tests show that this distilled version performs well at 10 steps cfg 1.0; choose euler or res_multistep based on your desired image type
LoRA
Text encoder
VAE
Model storage location
๐ ComfyUI/
โโโ ๐ models/
โ โโโ ๐ diffusion_models/
โ โ โโโ qwen_image_fp8_e4m3fn.safetensors
โ โ โโโ qwen_image_distill_full_fp8_e4m3fn.safetensors ## Distilled version
โ โโโ ๐ loras/
โ โ โโโ Qwen-Image-Lightning-8steps-V1.0.safetensors ## 8-step acceleration LoRA model
โ โโโ ๐ vae/
โ โ โโโ qwen_image_vae.safetensors
โ โโโ ๐ text_encoders/
โ โโโ qwen_2.5_vl_7b_fp8_scaled.safetensors
3. Complete the Workflow Step by Step
- Ensure the
Load Diffusion Model
node loadsqwen_image_fp8_e4m3fn.safetensors
- Ensure the
Load CLIP
node loadsqwen_2.5_vl_7b_fp8_scaled.safetensors
- Ensure the
Load VAE
node loadsqwen_image_vae.safetensors
- Ensure the image dimensions are set in the
EmptySD3LatentImage
node - Set the prompts in the
CLIP Text Encoder
node; currently tested to support at least: English, Chinese, Korean, Japanese, Italian, etc. - To enable the lightx2v 8-step acceleration LoRA, select it and use
Ctrl + B
to enable the node, and modify the Ksampler settings according to the parameters at position8
- Click the
Queue
button, or use the shortcutCtrl(cmd) + Enter
to run the workflow - Parameter settings for KSampler corresponding to different versions of models and workflows
The distilled version model and lightx2v 8-step acceleration LoRA seem unable to be used simultaneously; you can test specific combination parameters to verify if combination usage is feasible
Qwen-Image GGUF Version ComfyUI Workflow
The GGUF version is more friendly for users with low VRAM, and in certain weight configurations, you only need about 8GB of VRAM to run Qwen-Image
VRAM Usage Reference:
Workflow | VRAM Usage | First Generation | Subsequent Generation |
---|---|---|---|
qwen-image-Q4_K_S.gguf | 56% | โ 135s | โ 77s |
With 8steps LoRA | 56% | โ 100s | โ 45s |
Model address: Qwen-Image-gguf
1. Update or Install Custom Nodes
Using the GGUF version requires you to install or update the ComfyUI-GGUF plugin
Please refer to How to Install ComfyUI Custom Nodes, or search and install through Manager
2. Workflow Download
3. Model Download
The GGUF version uses only the diffusion model differently from others
Please visit https://huggingface.co/city96/Qwen-Image-gguf to download any weight; typically, larger file sizes mean better quality but also require more VRAM. In this tutorial, I will use the following version:
๐ ComfyUI/
โโโ ๐ models/
โ โโโ ๐ diffusion_models/
โ โ โโโ qwen-image-Q4_K_S.gguf # Or any other version you choose
3. Complete the Workflow Step by Step
- Ensure the
Unet Loader(GGUF)
node loadsqwen-image-Q4_K_S.gguf
or any other version you downloaded- Ensure ComfyUI-GGUF is installed and updated
- For
LightX2V 8Steps LoRA
, it is not enabled by default; you can select it and use Ctrl+B to enable the node - If the 8-step LoRA is not enabled, the default steps are 20; if you enable the 8-step LoRA, please set it to 8
- Here is the reference for setting the corresponding steps
- Click the
Queue
button, or use the shortcutCtrl(cmd) + Enter
to run the workflow
Qwen-Image Nunchaku Version Workflow
Model address: nunchaku-qwen-image Custom node address: https://github.com/nunchaku-tech/ComfyUI-nunchaku
Nunchaku support pending