2020
DOI: 10.48550/arxiv.2009.09672
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Alleviating the Inequality of Attention Heads for Neural Machine Translation

Abstract: Recent studies show that the attention heads in Transformer are not equal Michel et al., 2019). We relate this phenomenon to the imbalance training of multihead attention and the model dependence on specific heads. To tackle this problem, we propose a simple masking method: HeadMask, in two specific ways. Experiments show that translation improvements are achieved on multiple language pairs. Subsequent empirical analyses also support our assumption and confirm the effectiveness of the method.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
references
References 10 publications
0
0
0
Order By: Relevance