2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021
DOI: 10.1109/cvpr46437.2021.00697
|View full text |Cite
|
Sign up to set email alerts
|

Domain-robust VQA with diverse datasets and methods but no target labels

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
7
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 15 publications
(7 citation statements)
references
References 40 publications
0
7
0
Order By: Relevance
“…Moreover, when the exact nature of distribution shift between train and test splits is known (such as in (Agrawal et al, 2018)), approaches developed to tackle such shifts tend to rely on the explicit knowledge of construction of such OOD splits resulting in inflated sense of progress (Teney et al, 2020). Similar to us, Zhang et al (2021); Hudson and Manning (2019) also present some experimental results on VQA OOD evaluation, however they do it in limited manner (e.g., do not consider all pairs of datasets, do not evaluate the effect of multimodal pretraining, etc.). To our best knowledge, ours is the first work to extensively quantifying the extent of IID to OOD performance drops in current VQA models and study the effect of several factors: answer overlap, multimodal pretraining, generative vs. discriminative modeling, and stringent evaluation metric.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Moreover, when the exact nature of distribution shift between train and test splits is known (such as in (Agrawal et al, 2018)), approaches developed to tackle such shifts tend to rely on the explicit knowledge of construction of such OOD splits resulting in inflated sense of progress (Teney et al, 2020). Similar to us, Zhang et al (2021); Hudson and Manning (2019) also present some experimental results on VQA OOD evaluation, however they do it in limited manner (e.g., do not consider all pairs of datasets, do not evaluate the effect of multimodal pretraining, etc.). To our best knowledge, ours is the first work to extensively quantifying the extent of IID to OOD performance drops in current VQA models and study the effect of several factors: answer overlap, multimodal pretraining, generative vs. discriminative modeling, and stringent evaluation metric.…”
Section: Related Workmentioning
confidence: 99%
“…Domain adaptation in VQA. Some studies (Jabri et al, 2016;Chao et al, 2018;Zhang et al, 2021) have explored domain adaptation of VQA models from one VQA benchmark to another. Our focus, instead, is on evaluating zero-shot crossbenchmark generalization without any adaptation.…”
Section: Related Workmentioning
confidence: 99%
“…However these VQA datasets build a closed world that is not designed to generalize to real world images. Remarkably, some recent work has managed to show domain transfer from cartoon images to real images [67], but there is still a limitation on how much could be learned from these existing resources. Our proposed Hypersim-VQA and ThreeDWorld-VQA datasets provide a promising alternative that more realistically captures real world settings and offers a path forward in this direction.…”
Section: Introductionmentioning
confidence: 99%
“…[67] Two-stage DA; in both cases the first number in E refers to the training epoch parameter for the AE. For Domain Independent, di tokens is the additional output we use for the synthetic answer tokens.Adversarial MMD lr = 15e − 4 E = 100 + 13 lr = 1e − 3 E = 150 + 13 O wd = 1e − 6 O = Adam O wd = 1e − 4 O = Adam O = 1e − 4 O β = (0.8, 0.8) O = 1e − 4 O β = (0.8, 0.8) α = 2 (1+exp(−10 * p))−1 α = 0.4 β = 0.6 Domain Independent F-SWAP lr = 15e − 4 E = 13 lr = 15e − 4 E = 13 O wd = 0.2 O = Adam O wd = 1e − 1 O = Adam O = 1e − 9 O β = (0.9, 0.9) O = 1e − 9 O β = (0.9, 0.98) di tokens = 100 β = 1. λ = 0.2…”
mentioning
confidence: 99%
“…With the introduction of image search by image, according to internet search giant, the application of IBL attracted widespread attention. Besides, these academic fields own high enthusiasm for researching IBL: object detection [1-3], visual localization [4][5][6], simultaneous localization and mapping (SLAM) [7], etc.…”
Section: Introductionmentioning
confidence: 99%