2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021
DOI: 10.1109/cvpr46437.2021.00553
|View full text |Cite
|
Sign up to set email alerts
|

VinVL: Revisiting Visual Representations in Vision-Language Models

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

3
446
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 664 publications
(449 citation statements)
references
References 24 publications
3
446
0
Order By: Relevance
“…They tried to build adaptive captioning models that could work well with multilanguages instead of only specific ones. About captioning models, some studies have recently enhanced performance based on BERT-based models [15,28,29] are also promising.…”
Section: ) Previous Approachesmentioning
confidence: 99%
See 2 more Smart Citations
“…They tried to build adaptive captioning models that could work well with multilanguages instead of only specific ones. About captioning models, some studies have recently enhanced performance based on BERT-based models [15,28,29] are also promising.…”
Section: ) Previous Approachesmentioning
confidence: 99%
“…Besides using the Bottom-up Top-down architecture for extracting visual objects from an image, we explore two more pre-trained models named RelDN and VinVL [15], respectively.…”
Section: Objects Representation Explorationmentioning
confidence: 99%
See 1 more Smart Citation
“…• image texts, object labels, scene texts • object visual features, scene visual features Object labels and features are extracted from VinVL(Revisiting Visual Representations in Vision-Language Models) [3], which can generate representations of a richer collection of visual objects and concepts. Scene text and scene visual features are mainly coming from public Microsoft OCR API.…”
Section: Pre-training Strategymentioning
confidence: 99%
“…In addition to the main results in Table 3, we also evaluated existing OSCAR [35] / VinVL [57] models on AdVQA, since both models are known to perform well on VQA. Note that the training set of the off-the-shelf OSCAR and VinVL models includes COCO val2014 data, which overlaps with our validation set (COCO 2017).…”
Section: A Training Detailsmentioning
confidence: 99%