Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume 2021
DOI: 10.18653/v1/2021.eacl-main.240
|View full text |Cite
|
Sign up to set email alerts
|

‘Just because you are right, doesn’t mean I am wrong’: Overcoming a bottleneck in development and evaluation of Open-Ended VQA tasks

Abstract: GQA (Hudson and Manning, 2019) is a dataset for real-world visual reasoning and compositional question answering. We found that many answers predicted by the best visionlanguage models on the GQA dataset do not match the ground-truth answer but still are semantically meaningful and correct in the given context. In fact, this is the case with most existing visual question answering (VQA) datasets where they assume only one ground-truth answer for each question. We propose Alternative Answer Sets (AAS) of ground… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(2 citation statements)
references
References 15 publications
0
2
0
Order By: Relevance
“…Alternatively, Risch et al (2021) use a cross-encoder to measure the semantic similarity between predictions and gold answers. For the visual QA task, Luo et al (2021) incorporate alias answers in visual QA evaluation. In this work, instead of proposing new evaluation metrics, we improve the evaluation of ODQA models by augmenting gold answers with alias from knowledge bases.…”
Section: Experiment: Just As Sweetmentioning
confidence: 99%
“…Alternatively, Risch et al (2021) use a cross-encoder to measure the semantic similarity between predictions and gold answers. For the visual QA task, Luo et al (2021) incorporate alias answers in visual QA evaluation. In this work, instead of proposing new evaluation metrics, we improve the evaluation of ODQA models by augmenting gold answers with alias from knowledge bases.…”
Section: Experiment: Just As Sweetmentioning
confidence: 99%
“…There are numerous reasoning VQA methods [1,6,12,14,19,21,37,38,40,52,55,58,59,64] that focus on learning the relations between visual regions and words in questions implicitly, e.g., through message passing [50], pairwise relationship modeling [4], adversarial learning [8,32,51], or graph parsing methods defined by inter/intra-class edges [15]. Other works focus on leveraging external information [18] or explicit scene graph [5] to extract features from input images.…”
Section: Related Workmentioning
confidence: 99%