‘Just because you are right, doesn’t mean I am wrong’: Overcoming a bottleneck in development and evaluation of Open-Ended VQA tasks

Luo, Man; Sampat, Shailaja Keyur; Tallman, Riley; Zeng, Yankai; Vancha, Manuha; Sajja, Akarshan; Baral, Chitta

doi:10.18653/v1/2021.eacl-main.240

Cited by 7 publications

(2 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Alternatively, Risch et al (2021) use a cross-encoder to measure the semantic similarity between predictions and gold answers. For the visual QA task, Luo et al (2021) incorporate alias answers in visual QA evaluation. In this work, instead of proposing new evaluation metrics, we improve the evaluation of ODQA models by augmenting gold answers with alias from knowledge bases.…”

Section: Experiment: Just As Sweetmentioning

confidence: 99%

What's in a Name? Answer Equivalence For Open-Domain Question Answering

Zhao

Boyd-Graber

2021

Preprint

View full text Add to dashboard Cite

A flaw in QA evaluation is that annotations often only provide one gold answer. Thus, model predictions semantically equivalent to the answer but superficially different are considered incorrect. This work explores mining alias entities from knowledge bases and using them as additional gold answers (i.e., equivalent answers). We incorporate answers for two settings: evaluation with additional answers and model training with equivalent answers. We analyse three QA benchmarks: Natural Questions, TriviaQA and SQuAD. Answer expansion increases the exact match score on all datasets for evaluation, while incorporating it helps model training over real-world datasets. We ensure the additional answers are valid through a human post hoc evaluation. 1

show abstract

Section: Experiment: Just As Sweetmentioning

confidence: 99%

What's in a Name? Answer Equivalence For Open-Domain Question Answering

Zhao

Boyd-Graber

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…There are numerous reasoning VQA methods [1,6,12,14,19,21,37,38,40,52,55,58,59,64] that focus on learning the relations between visual regions and words in questions implicitly, e.g., through message passing [50], pairwise relationship modeling [4], adversarial learning [8,32,51], or graph parsing methods defined by inter/intra-class edges [15]. Other works focus on leveraging external information [18] or explicit scene graph [5] to extract features from input images.…”

Section: Related Workmentioning

confidence: 99%

Coarse-to-Fine Reasoning for Visual Question Answering

Nguyen¹,

Do²,

Tran³

et al. 2021

Preprint

View full text Add to dashboard Cite

Bridging the semantic gap between image and question is an important step to improve the accuracy of the Visual Question Answering (VQA) task. However, most of the existing VQA methods focus on attention mechanisms or visual relations for reasoning the answer, while the features at different semantic levels are not fully utilized. In this paper, we present a new reasoning framework to fill the gap between visual features and semantic clues in the VQA task. Our method first extracts the features and predicates from the image and question. We then propose a new reasoning framework to effectively jointly learn these features and predicates in a coarse-tofine manner. The intensively experimental results on three large-scale VQA datasets show that our proposed approach achieves superior accuracy comparing with other state-ofthe-art methods. Furthermore, our reasoning framework also provides an explainable way to understand the decision of the deep neural network when predicting the answer. Our source code and trained models are available at https://github.com/aioz-ai/CRF_VQA

show abstract