Skip to content

THUDM Open Sources New Image Generation Models: CogView3 and CogView-3Plus

THUDM recently open-sourced their latest image generation models CogView3 and CogView-3Plus-3B on GitHub. These two models represent the latest advancements in the field of text-to-image generation, demonstrating impressive performance and efficiency.

CogView3: Innovation in Cascaded Diffusion

CogView3 is a text-to-image generation system based on cascaded diffusion. It employs a novel framework called “relay diffusion,” which breaks down the process of generating high-resolution images into multiple stages. Through the relay super-resolution process, the system first generates low-resolution images, then adds Gaussian noise to them, and starts a new diffusion process from these noisy images.

According to THUDM’s research, CogView3 outperforms SDXL in human evaluations with a winning rate of up to 77.0%. Even more surprisingly, CogView3’s generation time is only one-tenth of SDXL’s, which has significant implications for practical applications.

CogView-3Plus-3B: Lightweight DiT Model

Alongside CogView3, THUDM also open-sourced CogView-3Plus-3B, an image generation model based on the DiT (Diffusion Transformer) architecture. The DiT model combines the advantages of diffusion models and Transformers, demonstrating powerful performance in image generation tasks.

As a relatively lightweight model (with only 3B parameters), CogView-3Plus-3B aims to provide faster inference speeds and lower resource requirements while maintaining high-quality output.

Open Source Contribution

By open-sourcing CogView3 and CogView-3Plus-3B, THUDM not only provides valuable resources for the research community but also offers possibilities for developers and businesses to integrate advanced image generation technologies into practical applications. The open-sourcing of these two models will help further advance text-to-image generation technology and its applications.

Future Prospects

With the open-sourcing of CogView3 and CogView-3Plus-3B, we can expect to see more innovative applications based on these models. From creative design to content generation to visual aid tools, these models have a wide range of potential application scenarios.

At the same time, this also provides valuable references for other research teams, potentially inspiring more innovations and breakthroughs in the field of image generation.