Tencent Announces Hunyuan Image 3.0 Release - World’s Largest Open-Source Text-to-Image Model

Tencent officially open-sourced Hunyuan Image 3.0 on September 28, the first open-source commercial-grade native multimodal image generation model, and currently the largest open-source image generation model with a total parameter count reaching 80B.

Key Features

Unprecedented Parameter Scale

Hunyuan Image 3.0 has a total of 80B parameters with 13B active parameters, using an MoE (Mixture of Experts) architecture with 64 experts, making it currently the world’s largest open-source text-to-image model.

World Knowledge Reasoning Capability

The model possesses native multimodal capabilities based on world knowledge reasoning, able to generate more accurate and richer image content by combining common sense and professional knowledge. The model can:

Generate nine-square grid sketch tutorials and algorithm flow visualizations
Explain physical principles, historical events, and biological processes
Create visual works based on literary works and poetry

Complex Semantic Understanding of Thousands of Characters

Hunyuan Image 3.0 supports complex semantic understanding of over 1000+ characters, which is extremely rare among similar open-source models. The model is able to:

Process complex scene descriptions
Understand multi-level detail requirements
Support bilingual Chinese and English input

Accurate Text Rendering

The model performs exceptionally well in generating text within images, supporting:

Title text in poster design
Annotation text in infographics
Brand logos and trademarks
Mixed multi-language text

Technical Architecture

Hunyuan Image 3.0 adopts an innovative MoE+Transfusion architecture, unifying multimodal understanding and generation capabilities. Unlike traditional DiT architectures, this model uses a unified autoregressive framework, achieving deep integration of text and image modalities.

Training Data

5 billion image-text pairs
6T text tokens
Progressive training strategy
Reinforcement learning post-training optimization

Usage Requirements

Hardware Configuration

This model may pose a significant challenge for ordinary consumer-grade GPUs, considering its enormous 80B parameter count; even quantized versions may be difficult to run smoothly on ordinary consumer-grade GPUs.

GPU: ≥3×80GB VRAM (recommended 4×80GB)
Storage: 170GB available space
Memory: 64GB+ system RAM
System: Linux + CUDA 12.8

Open Source Plan

Hunyuan Image 3.0 provides a complete open-source solution, including:

Inference code and model weights
HunyuanImage-3.0 base version
HunyuanImage-3.0-Instruct instruction version (supports reasoning capabilities)
Will support image-to-image generation, multi-turn interaction, and other features in the future

Open Source License

Hunyuan Image 3.0 uses the Tencent Hunyuan Community License Agreement open-source license. This license allows:

Individuals and enterprises to freely use, copy, distribute, and modify the model
Supports commercial use and derivative work development
Allows provision of hosted services through APIs or other means

Important Restrictions

Geographic Restrictions: This license does not apply to the EU, UK, and South Korea regions
User Scale Limitations: If your product or service has more than 100 million monthly active users, you need to apply to Tencent for additional licensing
Usage Restrictions: Prohibits using model outputs to improve other AI models (except for the Hunyuan series)
Compliance Requirements: Must comply with laws and regulations of various countries and acceptable use policies

Official Website: https://hunyuan.tencent.com/image
GitHub Repository: https://github.com/Tencent-Hunyuan/HunyuanImage-3.0
HuggingFace Model: https://huggingface.co/tencent/HunyuanImage-3.0
Technical Report: HunyuanImage 3.0 Technical Report

Alibaba Tongyi Lab Releases Z-Image-Turbo - Efficient 6B Parameter Image Generation Model