2018
DOI: 10.1007/978-3-030-01216-8_17
|View full text |Cite
|
Sign up to set email alerts
|

Grounding Visual Explanations

Abstract: Existing visual explanation generating agents learn to fluently justify a class prediction. However, they may mention visual attributes which reflect a strong class prior, although the evidence may not actually be in the image. This is particularly concerning as ultimately such agents fail in building trust with human users. To overcome this limitation, we propose a phrase-critic model to refine generated candidate explanations augmented with flipped phrases which we use as negative examples while training. At… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

1
126
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
4
3
1
1

Relationship

2
7

Authors

Journals

citations
Cited by 171 publications
(127 citation statements)
references
References 40 publications
1
126
0
Order By: Relevance
“…Task-specific caption generation While most image captioning works focus on a generic task of obtaining image relevant descriptions [3,12,58], some recent works explore pragmatic or "task-specific" captions. Some focus on generating textual explanations for deep models' predictions [19,20,45]. Others aim to generate a discriminative caption for an image or image region, to disambiguate it from a distractor [4,10,40,39,56,62].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Task-specific caption generation While most image captioning works focus on a generic task of obtaining image relevant descriptions [3,12,58], some recent works explore pragmatic or "task-specific" captions. Some focus on generating textual explanations for deep models' predictions [19,20,45]. Others aim to generate a discriminative caption for an image or image region, to disambiguate it from a distractor [4,10,40,39,56,62].…”
Section: Related Workmentioning
confidence: 99%
“…They can be especially useful, when tailored to a specific task or objective, such as e.g. explaining the model's predictions [20,45] or generating non-ambiguous referring expressions for specific image regions [40,62].…”
Section: Introductionmentioning
confidence: 99%
“…In the counterfactual visual explanation [22] work, a patch based editing of input image is optimize in order to satisfy the intended changes in the prediction. In the ground visual explanation [23] work, text based explanation are generated to provide counterfactual explanation for image classification task. Beside the causal interpretation methods, as demonstrated in [9], examining the relationship between the trained model and training dataset can also help interpret model behavior.…”
Section: Related Workmentioning
confidence: 99%
“…Few works have treated multimodality for explanation [21,2], which is visual and linguistic. Although they provided visual information by referring to a part of the target samples, we explore the method to utilize other examples for explanation.…”
Section: Related Workmentioning
confidence: 99%