Skip to content
ComfyUI Wiki
Help Build a Better ComfyUI Knowledge Base Become a Patron
NewsTencent Open Sources Hunyuan Image 3.0 - World's Largest Open-Source Text-to-Image Model

Tencent Announces Hunyuan Image 3.0 Release - World’s Largest Open-Source Text-to-Image Model

Tencent officially open-sourced Hunyuan Image 3.0 on September 28, the first open-source commercial-grade native multimodal image generation model, and currently the largest open-source image generation model with a total parameter count reaching 80B.

Key Features

Unprecedented Parameter Scale

Hunyuan Image 3.0 has a total of 80B parameters with 13B active parameters, using an MoE (Mixture of Experts) architecture with 64 experts, making it currently the world’s largest open-source text-to-image model.

World Knowledge Reasoning Capability

The model possesses native multimodal capabilities based on world knowledge reasoning, able to generate more accurate and richer image content by combining common sense and professional knowledge. The model can:

  • Generate nine-square grid sketch tutorials and algorithm flow visualizations
  • Explain physical principles, historical events, and biological processes
  • Create visual works based on literary works and poetry

Complex Semantic Understanding of Thousands of Characters

Hunyuan Image 3.0 supports complex semantic understanding of over 1000+ characters, which is extremely rare among similar open-source models. The model is able to:

  • Process complex scene descriptions
  • Understand multi-level detail requirements
  • Support bilingual Chinese and English input

Accurate Text Rendering

The model performs exceptionally well in generating text within images, supporting:

  • Title text in poster design
  • Annotation text in infographics
  • Brand logos and trademarks
  • Mixed multi-language text

Technical Architecture

Hunyuan Image 3.0 adopts an innovative MoE+Transfusion architecture, unifying multimodal understanding and generation capabilities. Unlike traditional DiT architectures, this model uses a unified autoregressive framework, achieving deep integration of text and image modalities.

Training Data

  • 5 billion image-text pairs
  • 6T text tokens
  • Progressive training strategy
  • Reinforcement learning post-training optimization

Usage Requirements

Hardware Configuration

This model may pose a significant challenge for ordinary consumer-grade GPUs, considering its enormous 80B parameter count; even quantized versions may be difficult to run smoothly on ordinary consumer-grade GPUs.

  • GPU: ≥3×80GB VRAM (recommended 4×80GB)
  • Storage: 170GB available space
  • Memory: 64GB+ system RAM
  • System: Linux + CUDA 12.8

Open Source Plan

Hunyuan Image 3.0 provides a complete open-source solution, including:

  • Inference code and model weights
  • HunyuanImage-3.0 base version
  • HunyuanImage-3.0-Instruct instruction version (supports reasoning capabilities)
  • Will support image-to-image generation, multi-turn interaction, and other features in the future

Open Source License

Hunyuan Image 3.0 uses the Tencent Hunyuan Community License Agreement open-source license. This license allows:

  • Individuals and enterprises to freely use, copy, distribute, and modify the model
  • Supports commercial use and derivative work development
  • Allows provision of hosted services through APIs or other means

Important Restrictions

  • Geographic Restrictions: This license does not apply to the EU, UK, and South Korea regions
  • User Scale Limitations: If your product or service has more than 100 million monthly active users, you need to apply to Tencent for additional licensing
  • Usage Restrictions: Prohibits using model outputs to improve other AI models (except for the Hunyuan series)
  • Compliance Requirements: Must comply with laws and regulations of various countries and acceptable use policies