Tencent Announces Hunyuan Image 3.0 Release - World’s Largest Open-Source Text-to-Image Model
Tencent officially open-sourced Hunyuan Image 3.0 on September 28, the first open-source commercial-grade native multimodal image generation model, and currently the largest open-source image generation model with a total parameter count reaching 80B.
Key Features
Unprecedented Parameter Scale
Hunyuan Image 3.0 has a total of 80B parameters with 13B active parameters, using an MoE (Mixture of Experts) architecture with 64 experts, making it currently the world’s largest open-source text-to-image model.
World Knowledge Reasoning Capability
The model possesses native multimodal capabilities based on world knowledge reasoning, able to generate more accurate and richer image content by combining common sense and professional knowledge. The model can:
- Generate nine-square grid sketch tutorials and algorithm flow visualizations
- Explain physical principles, historical events, and biological processes
- Create visual works based on literary works and poetry
Complex Semantic Understanding of Thousands of Characters
Hunyuan Image 3.0 supports complex semantic understanding of over 1000+ characters, which is extremely rare among similar open-source models. The model is able to:
- Process complex scene descriptions
- Understand multi-level detail requirements
- Support bilingual Chinese and English input
Accurate Text Rendering
The model performs exceptionally well in generating text within images, supporting:
- Title text in poster design
- Annotation text in infographics
- Brand logos and trademarks
- Mixed multi-language text
Technical Architecture
Hunyuan Image 3.0 adopts an innovative MoE+Transfusion architecture, unifying multimodal understanding and generation capabilities. Unlike traditional DiT architectures, this model uses a unified autoregressive framework, achieving deep integration of text and image modalities.
Training Data
- 5 billion image-text pairs
- 6T text tokens
- Progressive training strategy
- Reinforcement learning post-training optimization
Usage Requirements
Hardware Configuration
This model may pose a significant challenge for ordinary consumer-grade GPUs, considering its enormous 80B parameter count; even quantized versions may be difficult to run smoothly on ordinary consumer-grade GPUs.
- GPU: ≥3×80GB VRAM (recommended 4×80GB)
- Storage: 170GB available space
- Memory: 64GB+ system RAM
- System: Linux + CUDA 12.8
Open Source Plan
Hunyuan Image 3.0 provides a complete open-source solution, including:
- Inference code and model weights
- HunyuanImage-3.0 base version
- HunyuanImage-3.0-Instruct instruction version (supports reasoning capabilities)
- Will support image-to-image generation, multi-turn interaction, and other features in the future
Open Source License
Hunyuan Image 3.0 uses the Tencent Hunyuan Community License Agreement open-source license. This license allows:
- Individuals and enterprises to freely use, copy, distribute, and modify the model
- Supports commercial use and derivative work development
- Allows provision of hosted services through APIs or other means
Important Restrictions
- Geographic Restrictions: This license does not apply to the EU, UK, and South Korea regions
- User Scale Limitations: If your product or service has more than 100 million monthly active users, you need to apply to Tencent for additional licensing
- Usage Restrictions: Prohibits using model outputs to improve other AI models (except for the Hunyuan series)
- Compliance Requirements: Must comply with laws and regulations of various countries and acceptable use policies
Related Links
- Official Website: https://hunyuan.tencent.com/image
- GitHub Repository: https://github.com/Tencent-Hunyuan/HunyuanImage-3.0
- HuggingFace Model: https://huggingface.co/tencent/HunyuanImage-3.0
- Technical Report: HunyuanImage 3.0 Technical Report