Skip to content
Help Build a Better ComfyUI Knowledge Base Become a Patron
NewsTHUDM Open-Sources CogView4 - Native Chinese-Supported DiT Text-to-Image Model

THUDM Open-Sources CogView4 - Native Chinese-Supported DiT Text-to-Image Model

CogView4 Sample Outputs

THUDM has officially open-sourced the CogView4 multimodal generation model, the first Diffusion Transformer (DiT) model with native Chinese support and Chinese character generation capabilities. The model achieved a top score of 85.13 in the DPG-Bench benchmark, demonstrating exceptional image generation quality.

Key Features

Bilingual Generation

  • Enhanced GLM-4 text encoder supporting Chinese-English bilingual input
  • Trained on millions of Chinese-English image-text pairs
  • Achieves 61.68% F1 score in Chinese character generation accuracy

Smart Text Processing

  • Dynamic text length support (up to 1024 tokens)
  • Reduces redundant computations by 50% compared to fixed-length solutions
  • Improves training efficiency by up to 30%

Flexible Resolution

  • Supports output from 512px to 2048px
  • Mixed-resolution training for different scenarios
  • Optimized for social media aspect ratios (9:16, 1:1, 4:3)

Technical Advantages

Innovative “Relay Diffusion” framework:

  1. Base Generation: Rapid low-resolution outline creation
  2. Super-Resolution: Detail refinement through flow-matching
  3. Dynamic Noise Scheduling: Optimizes speed-quality balance

Benchmark Performance:

  • DPG-Bench score 85.13 (vs SDXL 74.65 / DALL-E 3 83.50)
  • T2I-CompBench complex scene score 0.3869
  • 114% improvement in Chinese character generation accuracy

Hardware Optimization

Multi-level optimization for different devices:

  • Basic Mode: Runs on RTX 3090 for 512x512 generation
  • Memory Optimization: Reduces VRAM usage to 13GB via CPU offloading
  • 4bit Quantization: Accelerates inference with compressed text encoder

Usage

Available through HuggingFace Spaces for instant testing. Developers can access full codebase via:

  • Mixed Chinese-English prompts
  • Custom output dimensions
  • Batch generation support

Resources

THUDM plans to release ControlNet modules, ComfyUI workflow support, and fine-tuning toolkits within three months to enhance accessibility for non-technical users.