Proceedings of the 5th Workshop on Representation Learning for NLP 2020
DOI: 10.18653/v1/2020.repl4nlp-1.17
|View full text |Cite
|
Sign up to set email alerts
|

Staying True to Your Word: (How) Can Attention Become Explanation?

Abstract: The attention mechanism has quickly become ubiquitous in NLP. In addition to improving performance of models, attention has been widely used as a glimpse into the inner workings of NLP models.The latter aspect has in the recent years become a common topic of discussion, most notably in work of Jain and Wallace, 2019; Wiegreffe and Pinter, 2019. With the shortcomings of using attention weights as a tool of transparency revealed, the attention mechanism has been stuck in a limbo without concrete proof when and w… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
42
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 22 publications
(43 citation statements)
references
References 10 publications
1
42
0
Order By: Relevance
“…A strong relationship on its own may validate the use of sparse attention, as the ability to identify a subset of influential intermediate representations would then directly translate to a set of influential inputs. Previous works show that the "contribution" of a token x i to its intermediate representation h i is often quite low for various model architectures (Salehinejad et al, 2017;Ming et al, 2017;Brunner et al, 2020;Tutek and Snajder, 2020). In the context of attention, we find this property to be evinced by the adversarial experiments of Wiegreffe and Pinter (2019) ( §4) and Jain and Wallace (2019) ( §4), which we verify in App.…”
Section: Methodssupporting
confidence: 78%
“…A strong relationship on its own may validate the use of sparse attention, as the ability to identify a subset of influential intermediate representations would then directly translate to a set of influential inputs. Previous works show that the "contribution" of a token x i to its intermediate representation h i is often quite low for various model architectures (Salehinejad et al, 2017;Ming et al, 2017;Brunner et al, 2020;Tutek and Snajder, 2020). In the context of attention, we find this property to be evinced by the adversarial experiments of Wiegreffe and Pinter (2019) ( §4) and Jain and Wallace (2019) ( §4), which we verify in App.…”
Section: Methodssupporting
confidence: 78%
“…They show using representation erasure that the resulting attention weights result in decision flips more easily as compared to vanilla attention. With a similar motivation, Tutek and Snajder (2020) use a word-level objective to achieve a stronger connection between hidden states and the words they represent, which affects attention. Not part of the recent debate, Deng et al (2018) propose variational attention as an alternative to the soft attention of Bahdanau et al (2015), arguing that the latter is not alignment, only an approximation thereof.…”
Section: Can Attention Be Improved?mentioning
confidence: 99%
“…The above works have inspired some to find ways to make attention more faithful and/or plausible, by changing the nature of the hidden representations attention is computed over using special training objectives (e.g., Mohankumar et al, 2020;Tutek and Snajder, 2020). Others have proposed replacing the attention mechanism with a latent alignment model (Deng et al, 2018).…”
Section: Introductionmentioning
confidence: 99%
“…Another work line aims to make attention better indicative of the inputs' importance (Kitada and Iyatomi, 2020;Tutek and Snajder, 2020;Mohankumar et al, 2020) which is designed for analysis with no significant performance gain, while our methods incorporate the analytical results to enhance the NMT performance.…”
Section: Related Workmentioning
confidence: 99%