2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022
DOI: 10.1109/cvpr52688.2022.01371
|View full text |Cite
|
Sign up to set email alerts
|

FashionVLP: Vision Language Transformer for Fashion Retrieval with Feedback

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
15
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 55 publications
(16 citation statements)
references
References 38 publications
0
15
0
Order By: Relevance
“…TGIR evaluation. We compare our FAME-ViL with TGIR-specialist methods [2,22,48,75,90] and the art fashion-focused V+L model FashionViL [24] under the original protocol used by FashionIQ [83]. The results are given in Tab.…”
Section: Comparisons With Prior Art Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…TGIR evaluation. We compare our FAME-ViL with TGIR-specialist methods [2,22,48,75,90] and the art fashion-focused V+L model FashionViL [24] under the original protocol used by FashionIQ [83]. The results are given in Tab.…”
Section: Comparisons With Prior Art Methodsmentioning
confidence: 99%
“…(4) Fashion Image Captioning (FIC) generates a caption to describe the given image with semantically meaningful, fine-grained, and accurate words [85]. Many recent works have been trying to address these fashion tasks through VLP [21,22,24,33,55,87,94]. Most of them focus on the pre-training, then simply fine-tune the pre-trained model on each downstream task independently.…”
Section: Related Workmentioning
confidence: 99%
“…In the vision-language field, transformer architectures have achieved great success in many tasks, e.g., vision-and-language pre-training [38], [57], [58], image generation [59], visual question answering [60], open-vocabulary detection [61], image retrieval [62], vision-and-language navigation [63], etc. Lu et al [57] design a co-attention mechanism to incorporate languageattended vision features into language features.…”
Section: Transformermentioning
confidence: 99%
“…More Applications. Besides VLP for standard VL tasks, VLP has also been applied to tackle (i) TextVQA (Singh et al, 2019) and TextCaps (Sidorov et al, 2020) tasks that require an AI system to comprehend scene text in order to perform VQA and captioning, such as TAP (Yang et al, 2021d) and LaTr (Biten et al, 2022); (ii) visual dialog (Das et al, 2017) that requires an AI system to chat about an input image, such as VisDial-BERT (Murahari et al, 2020) and VD-BERT (Wang et al, 2020b); (iii) fashion-domain tasks, such as Kaleido-BERT (Zhuge et al, 2021) and Fashion-VLP (Goenka et al, 2022); and (iv) vision-language navigation (VLN), such as PREVALENT (Hao et al, 2020) and VLN-BERT (Hong et al, 2021), to name a few. A detailed literature review on VLN can be found in Gu et al (2022b).…”
Section: Vlp For L Big Modelsmentioning
confidence: 99%