Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing 2018
DOI: 10.18653/v1/d18-1128
|View full text |Cite
|
Sign up to set email alerts
|

Do explanations make VQA models more predictable to a human?

Abstract: A rich line of research attempts to make deep neural networks more transparent by generating human-interpretable 'explanations' of their decision process, especially for interactive tasks like Visual Question Answering (VQA). In this work, we analyze if existing explanations indeed make a VQA model -its responses as well as failures -more predictable to a human. Surprisingly, we find that they do not. On the other hand, we find that humanin-the-loop approaches that treat the model as a black-box do.Sida I Wang… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
66
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 49 publications
(69 citation statements)
references
References 22 publications
3
66
0
Order By: Relevance
“…Even for well-defined tasks such as VQA, answers to questions like “Is it sunny?” can be inferred using multiple image regions. Indeed, inclusion of attention maps does not make a model more predictable for human observers (Chandrasekaran et al, 2018 ), and the attention-based models and humans do not look at same image regions (Das et al, 2016 ). This suggests attention maps are an unreliable means of conveying interpretable predictions.…”
Section: Shortcomings Of Vandl Researchmentioning
confidence: 99%
See 1 more Smart Citation
“…Even for well-defined tasks such as VQA, answers to questions like “Is it sunny?” can be inferred using multiple image regions. Indeed, inclusion of attention maps does not make a model more predictable for human observers (Chandrasekaran et al, 2018 ), and the attention-based models and humans do not look at same image regions (Das et al, 2016 ). This suggests attention maps are an unreliable means of conveying interpretable predictions.…”
Section: Shortcomings Of Vandl Researchmentioning
confidence: 99%
“…However, learning to predict explanations can suffer from many of the same problems faced by image captioning: evaluation is difficult and there can be multiple valid explanations. Currently, there is no reliable evidence that such explanations actually make the model more interpretable, but there is some evidence of the contrary (Chandrasekaran et al, 2018 ).…”
Section: Shortcomings Of Vandl Researchmentioning
confidence: 99%
“…Usefulness of Explanations Finally, other work studies how useful interpretations are for humans. and Lai and Tan (2019) show that text interpretations can provide benefits to humans, while Chandrasekaran et al (2018) shows explanations for visual QA models provided limited benefit. We present a method that enables adversaries to manipulate interpretations, which can have dire consequences for real-world users (Lakkaraju and Bastani, 2020).…”
Section: Natural Failures Of Interpretation Methodsmentioning
confidence: 99%
“…c. ESIM+ELMo [10]: ESIM is another high performing model for sentence-pair classification tasks, particularly when used with ELMo embeddings [57]. 9 We follow the standard train, val and test splits. VQA Baselines Additionally we compare our approach to models developed on the VQA dataset [5].…”
Section: Baselinesmentioning
confidence: 99%