2021
DOI: 10.48550/arxiv.2111.13333
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model

Abstract: Original Ours StyleCLIP with bangs double chin black hair with wrinkles pale Figure 1. Comparisons on disentangled image manipulation between the StyleCLIP [30] baseline and our Predict, Prevent, and Evaluate (PPE). Ours manages to manipulate only the command-attribute (as indicated under each column) while remaining unchanged to the others.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 34 publications
(57 reference statements)
0
2
0
Order By: Relevance
“…CLIP-based approaches. Benefiting from the large-scale visual-language training, CLIP has shown impressive capability and generalizability on a wide range of tasks, such as text-driven image manipulation [9,24,38,59], image captioning [14,33], view synthesis [17], object detection [12,49,72], and semantic segmentation [42,73]. These applications mainly focus on building the semantic relationship between texts and visual entities, and hence they suffer less from linguistic ambiguity.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…CLIP-based approaches. Benefiting from the large-scale visual-language training, CLIP has shown impressive capability and generalizability on a wide range of tasks, such as text-driven image manipulation [9,24,38,59], image captioning [14,33], view synthesis [17], object detection [12,49,72], and semantic segmentation [42,73]. These applications mainly focus on building the semantic relationship between texts and visual entities, and hence they suffer less from linguistic ambiguity.…”
Section: Related Workmentioning
confidence: 99%
“…The problem of harnessing CLIP for perception assessment can be more challenging compared to existing works related to objective attributes, such as image manipulation [9,24,38,59], object detection [12,49,72], and semantic segmenta-tion [42,73]. Specifically, CLIP is known to be sensitive to the choices of prompts [41], and perception is an abstract concept with no standardized adjectives, especially for the feel of images.…”
Section: Introductionmentioning
confidence: 99%