2021
DOI: 10.48550/arxiv.2106.14448
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

R-Drop: Regularized Dropout for Neural Networks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
14
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 9 publications
(14 citation statements)
references
References 29 publications
0
14
0
Order By: Relevance
“…There are only a few related researches in the fields of natural language process [9] and machine learning [29,36,66]. Specifically, Gao et al [9] introduce dropout as the alternative of data augmentation into contrasting learning for sequence representation learning while we focus on the inconsistency introduced by dropout in the data sparsity SR task.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations
“…There are only a few related researches in the fields of natural language process [9] and machine learning [29,36,66]. Specifically, Gao et al [9] introduce dropout as the alternative of data augmentation into contrasting learning for sequence representation learning while we focus on the inconsistency introduced by dropout in the data sparsity SR task.…”
Section: Related Workmentioning
confidence: 99%
“…Ma et al [36], Zolna et al [66] mainly focus on the gap between training and testing and utilize L2 for regularizing the representation space, which is less effective in the data sparsity setting, represented by the marginal to none performance improvements in Section 5. Different from introducing a regularization objective in the output space to constrain the randomness of sub-models brought by dropout [29], we focus on the consistency training of the data sparsity SR task from both the representation and output space. We also propose a simple yet effective regularization strategy in the representation space to compensate and align the output space consistency loss.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Note that we enforce the predicted logits to be evenly distributed when the model is fed by the negatively perturbed data. In addition, we introduce a simple yet effective bidirectional KL regularization trick (Liang et al 2021) in the above loss, which enables the output distributions of different sub-models generated by dropout to be consistent with each other:…”
Section: Model Trainingmentioning
confidence: 99%