R-Drop: Regularized Dropout for Neural Networks

Liang, Xiaobo; Wu, Lijun; Li, Juntao; Wang, Yue; Meng, Qi; Qin, Tao; Chen, Wei; Zhang, Min; Liu, Tie-Yan

doi:10.48550/arxiv.2106.14448

Cited by 9 publications

(14 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There are only a few related researches in the fields of natural language process [9] and machine learning [29,36,66]. Specifically, Gao et al [9] introduce dropout as the alternative of data augmentation into contrasting learning for sequence representation learning while we focus on the inconsistency introduced by dropout in the data sparsity SR task.…”

Section: Related Workmentioning

confidence: 99%

“…Ma et al [36], Zolna et al [66] mainly focus on the gap between training and testing and utilize L2 for regularizing the representation space, which is less effective in the data sparsity setting, represented by the marginal to none performance improvements in Section 5. Different from introducing a regularization objective in the output space to constrain the randomness of sub-models brought by dropout [29], we focus on the consistency training of the data sparsity SR task from both the representation and output space. We also propose a simple yet effective regularization strategy in the representation space to compensate and align the output space consistency loss.…”

Section: Related Workmentioning

confidence: 99%

“…Inspired by recent studies on dropout [29], we enhance the user representation from the perspective of reducing the model inconsistency and gap between training and testing. Concretely, we forward twice with different dropouts and learn the consistency between these two representations for each user, i.e., each user interaction sequence s u passing the forward network twice and obtain two representations 𝒔 𝒅 1 𝒖,𝒕 and 𝒔 𝒅 2 𝒖,𝒕 .…”

Section: Consistency Trainingmentioning

confidence: 99%

“…Inspired by the recent observation on the multi-head attention model that a very simple regularization strategy imposed on the output space of supervised tasks yields striking performance improvement [29] (achieving SOTA on many challenging tasks), we propose to thoroughly explore the effect of consistency training for the sequential recommendation task. We first introduce the simple bidirectional KL divergence regularization into the output space to constrain the inconsistency between two forward passes with different dropouts.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

C$^2$-Rec: An Effective Consistency Constraint for Sequential Recommendation

Liu¹,

Liu²,

Rongqin³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Sequential recommendation methods play an important role in real-world recommender systems. These systems are able to catch user preferences by taking advantage of historical records and then performing recommendations. Contrastive learning(CL) is a cuttingedge technology that can assist us in obtaining informative user representations, but these CL-based models need subtle negative sampling strategies, tedious data augmentation methods, and heavy hyper-parameters tuning work. In this paper, we introduce another way to generate better user representations and recommend more attractive items to users. Particularly, we put forward an effective Consistency Constraint for sequential Recommendation(C 2 -Rec) in which only two extra training objectives are used without any structural modifications and data augmentation strategies. Substantial experiments have been conducted on three benchmark datasets and one real industrial dataset, which proves that our proposed method outperforms SOTA models substantially. Furthermore, our method needs much less training time than those CL-based models. Online AB-test on real-world recommendation systems also achieves 10.141% improvement on the click-through rate and 10.541% increase on the average click number per capita. The code is available at https://github.com/zhengrongqin/C2-Rec. CCS CONCEPTS• Information systems → Recommender systems.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Consistency Trainingmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

C$^2$-Rec: An Effective Consistency Constraint for Sequential Recommendation

Liu¹,

Liu²,

Rongqin³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Note that we enforce the predicted logits to be evenly distributed when the model is fed by the negatively perturbed data. In addition, we introduce a simple yet effective bidirectional KL regularization trick (Liang et al 2021) in the above loss, which enables the output distributions of different sub-models generated by dropout to be consistent with each other:…”

Section: Model Trainingmentioning

confidence: 99%

Multi-Domain Transformer-Based Counterfactual Augmentation for Earnings Call Analysis

Yuan¹,

Zhu²,

Zhang³

et al. 2021

Preprint

View full text Add to dashboard Cite

Earnings call (EC), as a periodic teleconference of a publiclytraded company, has been extensively studied as an essential market indicator because of its high analytical value in corporate fundamentals. The recent emergence of deep learning techniques has shown great promise in creating automated pipelines to benefit the EC-supported financial applications. However, these methods presume all included contents to be informative without refining valuable semantics from longtext transcript and suffer from EC scarcity issue. Meanwhile, these black-box methods possess inherent difficulties in providing human-understandable explanations. To this end, in this paper, we propose a Multi-Domain Transformer-Based Counterfactual Augmentation, named MTCA, to address the above problems. Specifically, we first propose a transformerbased EC encoder to attentively quantify the task-inspired significance of critical EC content for market inference. Then, a multi-domain counterfactual learning framework is developed to evaluate the gradient-based variations after we perturb limited EC informative texts with plentiful cross-domain documents, enabling MTCA to perform unsupervised data augmentation. As a bonus, we discover a way to use nontraining data as instance-based explanations for which we show the result with case studies. Extensive experiments on the real-world financial datasets demonstrate the effectiveness of interpretable MTCA for improving the volatility evaluation ability of the state-of-the-art by 14.2% in accuracy.1 https://www.fool.com/earnings/calltranscripts/2021/07/30/pge-corporation-pcg-q2-2021-earningscall-transcri/

show abstract

GALAXY: A Generative Pre-trained Model for Task-Oriented Dialog with Semi-supervised Learning and Explicit Policy Injection

Dai

Zheng

et al. 2022

AAAI

View full text Add to dashboard Cite

Pre-trained models have proved to be powerful in enhancing task-oriented dialog systems. However, current pre-training methods mainly focus on enhancing dialog understanding and generation tasks while neglecting the exploitation of dialog policy. In this paper, we propose GALAXY, a novel pre-trained dialog model that explicitly learns dialog policy from limited labeled dialogs and large-scale unlabeled dialog corpora via semi-supervised learning. Specifically, we introduce a dialog act prediction task for policy optimization during pre-training and employ a consistency regularization term to refine the learned representation with the help of unlabeled dialogs. We also implement a gating mechanism to weigh suitable unlabeled dialog samples. Empirical results show that GALAXY substantially improves the performance of task-oriented dialog systems, and achieves new state-of-the-art results on benchmark datasets: In-Car, MultiWOZ2.0 and MultiWOZ2.1, improving their end-to-end combined scores by 2.5, 5.3 and 5.5 points, respectively. We also show that GALAXY has a stronger few-shot ability than existing models under various low-resource settings. For reproducibility, we release the code and data at https://github.com/siat-nlp/GALAXY.

show abstract

R-Drop: Regularized Dropout for Neural Networks

Cited by 9 publications

References 29 publications

C$^2$-Rec: An Effective Consistency Constraint for Sequential Recommendation

C$^2$-Rec: An Effective Consistency Constraint for Sequential Recommendation

Multi-Domain Transformer-Based Counterfactual Augmentation for Earnings Call Analysis

GALAXY: A Generative Pre-trained Model for Task-Oriented Dialog with Semi-supervised Learning and Explicit Policy Injection

Contact Info

Product

Resources

About