PixelFlow: Generative Models Working Directly in Pixel Space
Researchers from the University of Hong Kong and Adobe have jointly developed PixelFlow, a family of image generation models that work directly in raw pixel space. Unlike the currently dominant latent space models, PixelFlow adopts a completely new approach to image generation.
Innovative Features
The most significant innovation of PixelFlow is that it operates directly in raw pixel space, rather than in latent space like most mainstream models. This approach simplifies the image generation process with the following advantages:
- No dependency on pre-trained Variational Autoencoders (VAE)
- Support for end-to-end training of the entire model
- Achieves affordable computational cost in pixel space through efficient cascade flow modeling
In the 256x256 ImageNet class-conditional image generation benchmark, PixelFlow achieved an FID score of 1.98, while text-to-image results demonstrate its excellence in image quality, artistry, and semantic control.
Online Demo
The PixelFlow team provides a HuggingFace online demo for users to experience the model’s image generation capabilities: https://huggingface.co/spaces/ShoufaChen/PixelFlow
Model Library
PixelFlow currently offers two models:
- Class-to-image model: 677M parameters, FID score of 1.98
- Text-to-image model: 882M parameters
Detailed information about these two models is as follows:
Model Name | Task | Parameters | FID | Model Weights |
---|---|---|---|---|
PixelFlow | Class-to-image | 677M | 1.98 | 🤗 |
PixelFlow | Text-to-image | 882M | N/A | 🤗 |
Both models are available on the HuggingFace platform.
Future Outlook
The research team hopes this new paradigm will inspire and open up new opportunities for next-generation visual generation models. PixelFlow’s approach may lower the development threshold for generative models, inspiring more efficient and lightweight image generation methods.