2023
DOI: 10.1145/3592451
|View full text |Cite
|
Sign up to set email alerts
|

UniTune: Text-Driven Image Editing by Fine Tuning a Diffusion Model on a Single Image

Abstract: Text-driven image generation methods have shown impressive results recently, allowing casual users to generate high quality images by providing textual descriptions. However, similar capabilities for editing existing images are still out of reach. Text-driven image editing methods usually need edit masks, struggle with edits that require significant visual changes and cannot easily keep specific details of the edited portion. In this paper we make the observation that image-generation models can be converted t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 30 publications
(4 citation statements)
references
References 35 publications
0
4
0
Order By: Relevance
“…Specific methods [14][15][16][17]32] of image editing can also be utilized for image stylization. Kwon et al [14] propose a diffusion-based unsupervised image translation (DiffuseIT) method, which employs a pre-trained vision transformer to decouple semantic and structural features.…”
Section: Text-based Image Editingmentioning
confidence: 99%
See 1 more Smart Citation
“…Specific methods [14][15][16][17]32] of image editing can also be utilized for image stylization. Kwon et al [14] propose a diffusion-based unsupervised image translation (DiffuseIT) method, which employs a pre-trained vision transformer to decouple semantic and structural features.…”
Section: Text-based Image Editingmentioning
confidence: 99%
“…With their outstanding ability to produce rich stylizations, many DMs-based methods [11][12][13] produce high quality results. Furthermore, text-driven image stylization is feasible through image editing techniques [14][15][16][17]. Nonetheless, these techniques necessitate textual requirements to describe the content images or other additional requirements.…”
Section: Introductionmentioning
confidence: 99%
“…Finetuning the entire denoising model allows the model to better learn specific features of images and more accurately interpret textual prompts, resulting in edits that more closely align with user intent. UniTune [172] finetunes the diffusion model on a single base image during the tuning phase, encouraging the model to produce images similar to the base image. During the sampling phase, a modified sampling process is used to balance fidelity to the base image and alignment to the editing prompt.…”
Section: Denoising Model Finetuningmentioning
confidence: 99%
“…StyleDiffusion [179] introduces a Mapping Network that maps features of the input image to an embedding space aligned with the embedding space of textual prompts, effectively generating a prompt embedding. Cross-attention layers are Testing-Time Finetuning Approaches Denosing Model Finetuning UniTune [172], Custom-Edit [173], KV-Inversion [174] Embedding Finetuning…”
Section: Guidance With a Hypernetworkmentioning
confidence: 99%