Skip to content
ComfyUI Wiki
Help Build a Better ComfyUI Knowledge Base Become a Patron
NewsQwen-Image gets native support in ComfyUI

Qwen-Image gets native support in ComfyUI

Qwen-Image example

Qwen-Image is a 20B-parameter MMDiT (Multimodal Diffusion Transformer) image generation model designed for complex text rendering and fine-grained editing. It is open-sourced under the Apache‑2.0 license. The model recently gained native support in ComfyUI, making it easy to try via templates.

Related links

Model highlights

Based on the project page, the model excels in text-centric scenarios and editing consistency, while offering broad generation and understanding capabilities:

  • Complex text rendering: preserves typographic details and layout consistency across languages (e.g., Chinese and English); suited to images with headings, slogans, and structured layouts
  • Precise image editing: supports style transfer, object insertion/removal, detail enhancement, text editing within images, and even human pose adjustment
  • General generation ability: smoothly adapts to many styles—from photorealistic to impressionist, anime, and minimalist design
  • Image understanding tasks: object detection, semantic segmentation, depth and edge (Canny) estimation, novel‑view synthesis, and super‑resolution
  • Ecosystem and extensibility: updates indicate support for various LoRA (e.g., MajicBeauty) and provide multi‑GPU inference/queue‑management examples for scalable, high‑concurrency scenarios

Versions currently available in ComfyUI

  • Qwen-Image_bf16 (≈ 40.9 GB)
  • Qwen-Image_fp8 (≈ 20.4 GB)
  • Unofficial distilled variants (fewer inference steps)

Model resources are available here: Hugging Face - Comfy-Org/Qwen-Image_ComfyUIModelScope - Comfy-Org/Qwen-Image_ComfyUI

Performance

Below are measurements taken by the ComfyUI Wiki while preparing official documentation, using an RTX 4090D 24 GB:

Qwen-Image_fp8

  • VRAM: 86%
  • Generation time: 94 s (first run), 71 s (second)

Qwen-Image_bf16

  • VRAM: 96%
  • Generation time: 295 s (first run), 131 s (second)

Sources and further reading