2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022
DOI: 10.1109/cvpr52688.2022.01753
|View full text |Cite
|
Sign up to set email alerts
|

CLIPstyler: Image Style Transfer with a Single Text Condition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
44
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 135 publications
(44 citation statements)
references
References 27 publications
0
44
0
Order By: Relevance
“…Recently, several studies have attempted to replace style images with texts that describe certain styles. By using CLIP, a pre-trained language-image embedding model, Kwon et al [12] proposed a patch-wise CLIP loss to align text-image pairs of source and target in the CLIP space. However, CLIPstyler trains a style-specific model for each target style, requiring extra time and resources.…”
Section: Style Transfermentioning
confidence: 99%
See 2 more Smart Citations
“…Recently, several studies have attempted to replace style images with texts that describe certain styles. By using CLIP, a pre-trained language-image embedding model, Kwon et al [12] proposed a patch-wise CLIP loss to align text-image pairs of source and target in the CLIP space. However, CLIPstyler trains a style-specific model for each target style, requiring extra time and resources.…”
Section: Style Transfermentioning
confidence: 99%
“…We assume that the image emdeddings obtained from the CLIP's encoder can also be divided into style f s i and content f c i parts in the CLIP emdedding space. Therefore, CLIPstyler [12] can achieve text-based style transfer by utilizing CLIP's feature. It uses the text encoder to encode the text describing the style as f s t .…”
Section: Styles In Clip Spacementioning
confidence: 99%
See 1 more Smart Citation
“…Our goal is to recognize these descriptions automatically. CLIPstyler [38] proposed patchCLIP for transferring semantic texture information on text conditions. GLIDE [47] and DALL-E 2 [50] focus on open domain image synthesis.…”
Section: Related Workmentioning
confidence: 99%
“…CLIP [49] was presented to acquire visual concepts with natural language supervision and can provide the similarity scores between texts and images. Several works have used CLIP to steer generative models, such as GANs [21,38,48], toward user-defined text prompts. In this paper, we leverage a pre-trained CLIP model for text-driven and image-driven art paintings synthesis.…”
Section: Clip-based Multimodal Guidancementioning
confidence: 99%