Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2020
DOI: 10.18653/v1/2020.emnlp-main.574
|View full text |Cite
|
Sign up to set email alerts
|

Attention is Not Only a Weight: Analyzing Transformers with Vector Norms

Abstract: Attention is a key component of Transformers, which have recently achieved considerable success in natural language processing. Hence, attention is being extensively studied to investigate various linguistic capabilities of Transformers, focusing on analyzing the parallels between attention weights and specific linguistic phenomena. This paper shows that attention weights alone are only one of the two factors that determine the output of attention and proposes a norm-based analysis that incorporates the second… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

4
107
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 114 publications
(111 citation statements)
references
References 23 publications
(50 reference statements)
4
107
0
Order By: Relevance
“…More recently, Kobayashi et al (2020) showed that the norms of attention-weighted input vectors, which yield a more intuitive interpretation of selfattention, reduce the attention to special tokens. However, even when the attention weights are normed, it is still not the case that most heads that do the ''heavy lifting'' are even potentially interpretable (Prasanna et al, 2020).…”
Section: Attention To Special Tokensmentioning
confidence: 99%
“…More recently, Kobayashi et al (2020) showed that the norms of attention-weighted input vectors, which yield a more intuitive interpretation of selfattention, reduce the attention to special tokens. However, even when the attention weights are normed, it is still not the case that most heads that do the ''heavy lifting'' are even potentially interpretable (Prasanna et al, 2020).…”
Section: Attention To Special Tokensmentioning
confidence: 99%
“…As an alternative to analyzing attention weights, Kobayashi et al (2020) propose anayzing the norm of vectors produced by multiplying the outputs of the value matrix with the attention weights. Follow-ing the experimental setting of Clark et al (2019), i.e., by analyzing 992 sequences extracted from Wikipedia, their norm-based analysis also shows that the contributions of [SEP] and punctuations are actually small.…”
Section: Discussionmentioning
confidence: 99%
“…Among the work that is relevant to encoder-decoder attentions, Michel et al (2019) and Voita et al (2019) observe that only a small portion of heads is relevant for translation and encoder-decoder attentions tend to be more important than self-attentions. Meanwhile, word alignments for machine translation are induced from encoder-decoder attention weights (Li et al, 2019;Kobayashi et al, 2020). However, none of prior work employs attentions to improve generation quality.…”
Section: Related Workmentioning
confidence: 99%