Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua 2021
DOI: 10.18653/v1/2021.naacl-main.302
|View full text |Cite
|
Sign up to set email alerts
|

UniDrop: A Simple yet Effective Technique to Improve Transformer without Extra Cost

Abstract: Transformer architecture achieves great success in abundant natural language processing tasks. The over-parameterization of the Transformer model has motivated plenty of works to alleviate its overfitting for superior performances. With some explorations, we find simple techniques such as dropout, can greatly boost model performance with a careful design. Therefore, in this paper, we integrate different dropout techniques into the training of Transformer models. Specifically, we propose an approach named UniDr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 18 publications
(3 citation statements)
references
References 28 publications
(25 reference statements)
0
3
0
Order By: Relevance
“…By contrast, CipherDAug trains a single model, and improves the baseline transformer by 2.9 BLEU points on IWSLT14 De→En and about 2.2 BLEU points on the smaller datasets. (Zhu et al, 2020) 1x(+BERT) 36.11 MAT (Fan et al, 2020) 0.9x 36.22 UniDrop (Wu et al, 2021b) 1x 36.88 R-DROP (Liang et al, 2021) 1x 37.25 BiBERT (Xu et al, 2021) 1x(+BERT) 37.50…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…By contrast, CipherDAug trains a single model, and improves the baseline transformer by 2.9 BLEU points on IWSLT14 De→En and about 2.2 BLEU points on the smaller datasets. (Zhu et al, 2020) 1x(+BERT) 36.11 MAT (Fan et al, 2020) 0.9x 36.22 UniDrop (Wu et al, 2021b) 1x 36.88 R-DROP (Liang et al, 2021) 1x 37.25 BiBERT (Xu et al, 2021) 1x(+BERT) 37.50…”
Section: Resultsmentioning
confidence: 99%
“…Macaron Net (Lu* et al, 2020) 35.40 BERT Fuse (Zhu et al, 2020) 36.11 MAT (Fan et al, 2020) 36.22 UniDrop (Wu et al, 2021b) 36.88 R-DROP (Liang et al, 2021) 37.25 BiBERT (Xu et al, 2021) 37.50…”
Section: A4 Comparison With Other Methodsmentioning
confidence: 99%
“…Macaron Net (Lu* et al, 2020) 35.40 BERT Fuse (Zhu et al, 2020) 36.11 MAT 36.22 UniDrop (Wu et al, 2021b) 36.88 R-DROP (Liang et al, 2021) 37.25 BiBERT (Xu et al, 2021) 37.50…”
Section: A4 Comparison With Other Methodsmentioning
confidence: 99%