Proceedings of the 3rd Workshop on Neural Generation and Translation 2019
DOI: 10.18653/v1/d19-5624
|View full text |Cite
|
Sign up to set email alerts
|

Interrogating the Explanatory Power of Attention in Neural Machine Translation

Abstract: Attention models have become a crucial component in neural machine translation (NMT). They are often implicitly or explicitly used to justify the model's decision in generating a specific token but it has not yet been rigorously established to what extent attention is a reliable source of information in NMT. To evaluate the explanatory power of attention for NMT, we examine the possibility of yielding the same prediction but with counterfactual attention models that modify crucial aspects of the trained attent… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
6
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
4
1
1

Relationship

1
5

Authors

Journals

citations
Cited by 12 publications
(10 citation statements)
references
References 35 publications
1
6
0
Order By: Relevance
“…While most of these works provide evidence that attention weights are not always faithful, Moradi et al (2019) confirm similar observations on the unfaithful nature of attention in the context of NMT models. Li et al (2020) is one of the few papers examining attention models in NMT.…”
Section: Related Worksupporting
confidence: 75%
See 1 more Smart Citation
“…While most of these works provide evidence that attention weights are not always faithful, Moradi et al (2019) confirm similar observations on the unfaithful nature of attention in the context of NMT models. Li et al (2020) is one of the few papers examining attention models in NMT.…”
Section: Related Worksupporting
confidence: 75%
“…It is worth noting that increase in faithfulness of attention-based explanations for function words is much more than that of content words. This can be attributed to the fact the function words are mostly generated using the target-side information in the decoder (Tu et al, 2017;Moradi et al, 2019) and manipulating attention does not have much effect on generating them. However, our proposed faithfulness objective (F f aith ) seems to tighten the dependence of the decoder on the attention component.…”
Section: Impact On Faithfulnessmentioning
confidence: 99%
“…Attention mechanisms learn to assign soft weights to (usually contextualized) token representations, and so one can extract highly weighted tokens as rationales. However, attention weights do not in general provide faithful explanations for predictions (Jain and Wallace, 2019;Serrano and Smith, 2019;Wiegreffe and Pinter, 2019;Zhong et al, 2019;Pruthi et al, 2020;Brunner et al, 2020;Moradi et al, 2019;Vashishth et al, 2019). This likely owes to encoders entangling inputs, complicating the interpretation of attention weights on inputs over contextualized representations of the same.…”
Section: Related Workmentioning
confidence: 99%
“…Depending on the task and model architecture, attention may have more or less explanatory power for model predictions [35,51,57,71,79]. Visualization techniques have been used to convey the structure and properties of attention in Transformers [31,40,80,82].…”
Section: Interpreting Models In Nlpmentioning
confidence: 99%