Alleviating the Inequality of Attention Heads for Neural Machine Translation

Sun, Zewei; Huang, Shujian; Dai, Xinyu; Chen, Jiajun

doi:10.48550/arxiv.2009.09672

2020

DOI: 10.48550/arxiv.2009.09672

|View full text |Cite

Preprint

Alleviating the Inequality of Attention Heads for Neural Machine Translation

Zewei Sun¹,

Shujian Huang²,

Xinyu Dai³

et al.

Abstract: Recent studies show that the attention heads in Transformer are not equal Michel et al., 2019). We relate this phenomenon to the imbalance training of multihead attention and the model dependence on specific heads. To tackle this problem, we propose a simple masking method: HeadMask, in two specific ways. Experiments show that translation improvements are achieved on multiple language pairs. Subsequent empirical analyses also support our assumption and confirm the effectiveness of the method.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2021

Publication Types

Select...

Other1

Relationship

Self Cite0

Independent1

Authors

Journals

Cited by 1 publication

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

UniDrop: A Simple yet Effective Technique to Improve Transformer without Extra Cost

Wu¹,

Wu²,

Meng³

et al. 2021

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

Transformer architecture achieves great success in abundant natural language processing tasks. The over-parameterization of the Transformer model has motivated plenty of works to alleviate its overfitting for superior performances. With some explorations, we find simple techniques such as dropout, can greatly boost model performance with a careful design. Therefore, in this paper, we integrate different dropout techniques into the training of Transformer models. Specifically, we propose an approach named UniDrop to unite three different dropout techniques from fine-grain to coarse-grain, i.e., feature dropout, structure dropout, and data dropout. Theoretically, we demonstrate that these three dropouts play different roles from regularization perspectives. Empirically, we conduct experiments on both neural machine translation and text classification benchmark datasets. Extensive results indicate that Transformer with UniDrop can achieve around 1.5 BLEU improvement on IWSLT14 translation tasks, and better accuracy for the classification even using strong pre-trained RoBERTa as backbone.

show abstract

UniDrop: A Simple yet Effective Technique to Improve Transformer without Extra Cost

Wu¹,

Wu²,

Meng³

et al. 2021

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Alleviating the Inequality of Attention Heads for Neural Machine Translation

Cited by 1 publication

References 10 publications

UniDrop: A Simple yet Effective Technique to Improve Transformer without Extra Cost

UniDrop: A Simple yet Effective Technique to Improve Transformer without Extra Cost

Contact Info

Product

Resources

About