2022
DOI: 10.48550/arxiv.2210.09276
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Imagic: Text-Based Real Image Editing with Diffusion Models

Abstract: Text-conditioned image editing has recently attracted considerable interest. However, most methods are currently either limited to specific editing types (e.g., object overlay, style transfer), or apply to synthetically generated images, or require multiple input images of a common object. In this paper we demonstrate, for the very first time, the ability to apply complex (e.g., non-rigid) text-guided semantic edits to a single real image. For example, we can change the posture and composition of one or multip… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
82
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 45 publications
(82 citation statements)
references
References 34 publications
0
82
0
Order By: Relevance
“…Another related line of work aims to introduce specific concepts to a pre-trained text-to-image model by learning to map a set of images to a "word" in the embedding space of the model [18,25,41]. Several works have also explored providing users with more control over the synthesis process solely through the use of the input text prompt [8,20,24,46].…”
Section: Related Workmentioning
confidence: 99%
“…Another related line of work aims to introduce specific concepts to a pre-trained text-to-image model by learning to map a set of images to a "word" in the embedding space of the model [18,25,41]. Several works have also explored providing users with more control over the synthesis process solely through the use of the input text prompt [8,20,24,46].…”
Section: Related Workmentioning
confidence: 99%
“…This unprecedented capability became instantly popular, as users were able to synthesize high-quality images by simply describing the desired result in natural language, as we demonstrate in Figure 9.5. These models have become a centerpiece in an ongoing and quickly advancing research area, as they have been adapted for image editing [147,202], object recontextualization [241,95], 3D object generation [220], and more [119,129,213,346].…”
Section: Regularization By Denoising (Red)mentioning
confidence: 99%
“…To demonstrate the difficulty of eL-TGIM, we adopt several recent powerful image editors, including transformer-based architectures (Muse [1]), diffusion-based pixel-level model (Imagen [2] based Imagic [3]) and latent-level model (DALLE2 [4]). Results are shown in Fig.…”
Section: Generated Maskmentioning
confidence: 99%
“…1. Comparison between our method and powerful image editors, including transformer-based architectures (Muse [1]), diffusion-based pixel-level model (Imagen [2] based Imagic [3]) and latent-level model (DALLE2 [4]). Analyses are in the second paragraph of Sec.…”
Section: Introductionmentioning
confidence: 99%