ID-Patch: A New Method for Multi-Identity Personalized Group Photo Generation

Diffusion Models, as the mainstream technology for text-to-image generation, are widely used in artistic creation and content production. While single-person image generation has become quite mature, multi-person scene generation still faces challenges. Users often need to generate group photos or multi-character scenes, such as completing group photos or creating multi-character advertisements.

The main current challenge is identity feature leakage - when generating multi-person images, facial features of different individuals tend to get mixed up, making it difficult to maintain their unique characteristics. Additionally, users want precise control over each person’s position and pose to achieve more natural-looking results.

Introduction to ID-Patch Method

ByteDance and Michigan State University jointly proposed the ID-Patch method. This method has made significant progress in identity preservation, position control, and generation efficiency. The core innovations of ID-Patch include:

ID Patch: Generates unique identity patches for each person, precisely placed at specified locations in the conditional image to achieve spatial identity control.
ID Embedding: Combines identity features with text embeddings to enhance facial similarity and identity consistency.
Efficient Inference: ID-Patch generates images 7 times faster than OMG and has lower computational overhead than InstantFamily.

Results Showcase

The following image shows a comparison between ID-Patch and mainstream methods:

Comparison of ID-Patch with Mainstream Methods

From left to right: conditional input, OMG (InstantID), InstantFamily, and ID-Patch. It can be seen that ID-Patch better preserves detailed identity information for each person, avoiding issues such as hair loss, hand artifacts, and identity confusion.

More Generation Examples

Using ID-Patch to generate images with arbitrary poses:

Generating Images with Arbitrary Poses using ID-Patch

Plug-and-play: Canny Edge conditional generation

Plug-and-play: Canny Edge

ID-Patch method workflow

ID-Patch Method Workflow

Method Overview

The ID-Patch method achieves multi-identity personalized group photo generation through the following process:

Input text prompts (e.g., “two people shaking hands”), multiple face images, and their positions.
Extract facial features for each person and generate ID patches and ID embeddings.
Overlay ID patches onto the conditional image at specified positions to achieve spatial control.
Combine ID embeddings with text embeddings to enhance facial similarity.
Generate the final image through the diffusion model, ensuring accurate identity and position for each person.

Experiments and Conclusions

Experimental results show that ID-Patch outperforms existing methods in terms of facial similarity, identity-position correlation accuracy, and generation efficiency. Its unique patch mechanism and efficient inference pipeline provide a new solution for multi-identity image generation.

This content is based on the official paper, project page, and related materials, aiming to provide accessible technical interpretation for users in the AI image generation field. For more information, please visit the links above.

SVI 2.0 Pro Released - Infinite-Length Video Generation with Wan 2.2 Support