ComfyUI Glossary | ComfyUI Wiki

CLIP

CLIP (Contrastive Language-Image Pretraining) is a model developed by OpenAI that connects image and text information. It can understand the relationship between images and text, helping to generate images, describe images, or perform image classification tasks.

diffusion model

A diffusion model is a generative model that gradually adds noise to data and then trains a model to remove the noise in order to generate data. It has shown strong capabilities in generating images and other types of data. The training process of diffusion models includes a forward process (adding noise) and a reverse process (removing noise).

denoise

Denoising refers to the process of recovering clear information from noisy images or data. In diffusion models, denoising involves the model gradually reducing noise to restore the data, making the generated images as close to real images as possible.

Latent

Latent refers to the hidden representation or features used to represent data in generative models. It is an abstract, low-dimensional representation of data (e.g., images) obtained through an encoder, capturing the core characteristics of the data.

Latent space

Latent space is a high-dimensional space used to represent the latent features of data. In generative models, data is first mapped into the latent space and then new data is generated from the latent space through a decoder. The characteristics of the latent space allow the model to generate a variety of complex data samples.

VAE

VAE (Variational Autoencoder) is a generative model designed to learn the latent representation of data through an encoder and decoder. The encoder maps the input data into the latent space, while the decoder generates new data from the latent representations. The goal of a VAE is to maximize the likelihood of the data while minimizing the divergence between the latent space distribution and the predefined distribution.

CLIP

diffusion model

denoise

Latent

Latent space

VAE

Comments