Cycle-Consistency for Robust Visual Question Answering

Shah, Meet; Chen, Xinlei; Rohrbach, Marcus; Parikh, Devi

doi:10.1109/cvpr.2019.00681

Cited by 149 publications

(159 citation statements)

References 30 publications

Supporting

Mentioning

159

Contrasting

Order By: Relevance

“…Keeping the visual input unchanged can allow natural language semantic understanding to be better studied. Recent works have done this by rephrasing queries (Shah et al, 2019 ). To some extent, this can be done automatically by merging/negating existing queries, replacing words with synonyms, and introducing distractors.…”

Section: Addressing Shortcomingsmentioning

confidence: 99%

Challenges and Prospects in Vision and Language Research

Kafle

Shrestha

Kanan

2019

Front. Artif. Intell.

View full text Add to dashboard Cite

Language grounded image understanding tasks have often been proposed as a method for evaluating progress in artificial intelligence. Ideally, these tasks should test a plethora of capabilities that integrate computer vision, reasoning, and natural language understanding. However, rather than behaving as visual Turing tests, recent studies have demonstrated stateof-the-art systems are achieving good performance through flaws in datasets and evaluation procedures. We review the current state of affairs and outline a path forward.

show abstract

Section: Addressing Shortcomingsmentioning

confidence: 99%

Challenges and Prospects in Vision and Language Research

Kafle

Shrestha

Kanan

2019

Front. Artif. Intell.

View full text Add to dashboard Cite

show abstract

“…While useful, these do not take the relationship between predictions into account, and thus do not capture problems like the ones in Figure 1. Exceptions exist when trying to gauge robustness: Ribeiro et al (2018) consider the robustness of QA models to automatically generated input rephrasings, while Shah et al (2019) evaluate VQA models on crowdsourced rephrasings for robustness. While important for evaluation, these efforts are orthogonal to our focus on consistency.…”

Section: Related Workmentioning

confidence: 99%

Are Red Roses Red? Evaluating Consistency of Question-Answering Models

Ribeiro

Guestrin

Singh

2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

View full text Add to dashboard Cite

Although current evaluation of questionanswering systems treats predictions in isolation, we need to consider the relationship between predictions to measure true understanding. A model should be penalized for answering "no" to "Is the rose red?" if it answers "red" to "What color is the rose?". We propose a method to automatically extract such implications for instances from two QA datasets, VQA and SQuAD, which we then use to evaluate the consistency of models. Human evaluation shows these generated implications are well formed and valid. Consistency evaluation provides crucial insights into gaps in existing models, and retraining with implicationaugmented data improves consistency on both synthetic and human-generated implications. How many birds? A: 1 Is there 1 bird? A: no Are there 2 birds? A: yes Are there any birds? A: no (b) Model (Zhang et al., 2018) provides inconsistent answers. Kublai originally named his eldest son, Zhenjin, as the Crown Prince, but he died before Kublai in 1285. (c) Excerpt from an input paragraph, SQuAD dataset.

show abstract

“…Second, we collect human-annotated QA pairs based on common-sense in addition to the logic-based QA's. The most relevant work to ours is Shah et al (2019). However, they focus strictly on question paraphrases that maintains the same answers as the source question.…”

Section: Related Workmentioning

confidence: 99%

“…Checking for consistency can be considered as an interrogative Turing Test (Radziwill and Benton, 2017) for linguistic robustness (Stede, 1992), . Works such as Xu et al (2018) explore the robustness of VQA with respect to image variations, whereas works such as Ray et al (2016) and Mahendru et al (2017) focus on the understanding of the premise of a question instead of relying on dataset biases (Agrawal et al, 2017) (Goyal et al, 2017) or linguistic biases (Ramakrishnan et al, 2018 Shah et al (2019). However, they focus strictly on question paraphrases that maintains the same answers as the source question.…”

Section: Related Workmentioning

confidence: 99%

Sunny and Dark Outside?! Improving Answer Consistency in VQA through Entailed Question Generation

Ray¹,

Sikka²,

Divakaran³

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

While models for Visual Question Answering (VQA) have steadily improved over the years, interacting with one quickly reveals that these models lack consistency. For instance, if a model answers "red" to "What color is the balloon?", it might answer "no" if asked, "Is the balloon red?". These responses violate simple notions of entailment and raise questions about how effectively VQA models ground language. In this work, we introduce a dataset, ConVQA, and metrics that enable quantitative evaluation of consistency in VQA. For a given observable fact in an image (e.g. the balloon's color), we generate a set of logically consistent question-answer (QA) pairs (e.g. Is the balloon red?) and also collect a humanannotated set of common-sense based consistent QA pairs (e.g. Is the balloon the same color as tomato sauce?). Further, we propose a consistency-improving data augmentation module, a Consistency Teacher Module (CTM). CTM automatically generates entailed (or similar-intent) questions for a source QA pair and fine-tunes the VQA model if the VQA's answer to the entailed question is consistent with the source QA pair. We demonstrate that our CTM-based training improves the consistency of VQA models on the Con-VQA datasets and is a strong baseline for further research.

show abstract

Cycle-Consistency for Robust Visual Question Answering

Cited by 149 publications

References 30 publications

Challenges and Prospects in Vision and Language Research

Challenges and Prospects in Vision and Language Research

Are Red Roses Red? Evaluating Consistency of Question-Answering Models

Sunny and Dark Outside?! Improving Answer Consistency in VQA through Entailed Question Generation

Contact Info

Product

Resources

About