Adversarial Training for Commonsense Inference

Pereira, Lis Kanashiro; Liu, Xiaodong; Cheng, Fei; Asahara, Masayuki; Kobayashi, Ichiro

doi:10.18653/v1/2020.repl4nlp-1.8

Cited by 19 publications

(16 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The dropout is a widely-used approach in deep learning to improve model generalization (Srivastava et al, 2014). For adversarial learning methods, the main theme is reducing the model sensitivity toward small input perturbations (Goodfellow et al, 2014;Madry et al, 2018), which has been recently applied to both fine-turning (Jiang et al, 2020;Pereira et al, 2020;Zhu et al, 2020;Li and Qiu, 2020) and pre-training . However, models trained with adversarial learning are found to have at-odd generalization (Tsipras et al, 2019;Zhang et al, 2019).…”

Section: Related Workmentioning

confidence: 99%

Posterior Differential Regularization with f-divergence for Improving Model Robustness

Cheng¹,

Liu²,

Pereira³

et al. 2021

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

Self Cite

View full text Add to dashboard Cite

We address the problem of enhancing model robustness through regularization. Specifically, we focus on methods that regularize the model posterior difference between clean and noisy inputs. Theoretically, we provide a connection of two recent methods, Jacobian Regularization and Virtual Adversarial Training, under this framework. Additionally, we generalize the posterior differential regularization to the family of f -divergences and characterize the overall framework in terms of Jacobian matrix. Empirically, we compare those regularizations and standard BERT training on a diverse set of tasks to provide a comprehensive profile of their effect on model generalization. For both fully supervised and semi-supervised settings, we show that regularizing the posterior difference with f -divergence can result in well-improved model robustness. In particular, with a proper f -divergence, a BERT-base model can achieve comparable generalization as its BERT-large counterpart for in-domain, adversarial and domain shift scenarios, indicating the great potential of the proposed framework for enhancing NLP model robustness. 1

show abstract

Section: Related Workmentioning

confidence: 99%

Posterior Differential Regularization with f-divergence for Improving Model Robustness

Cheng¹,

Liu²,

Pereira³

et al. 2021

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

Self Cite

View full text Add to dashboard Cite

show abstract

“…Subtask 1 and Subtask 2). Adversarial training (ADV): Adversarial training has proven effective in improving model generalization and robustness in computer vision (Madry et al, 2017;Goodfellow et al, 2014) and more recently in NLP (Zhu et al, 2019;Liu et al, 2020a;Pereira et al, 2020). It works by augmenting the input with a small perturbation that maximizes the adversarial loss:…”

Section: Training Proceduresmentioning

confidence: 99%

“…where the inner maximization can be solved by projected gradient descent (Madry et al, 2017). Recently, adversarial training has been successfully applied to NLP as well (Zhu et al, 2019;Pereira et al, 2020). In our experiments, we use SMART , which instead regularizes the standard training objective using virtual adversarial training (Miyato et al, 2018):…”

Section: Training Proceduresmentioning

confidence: 99%

OCHADAI-KYOTO at SemEval-2021 Task 1: Enhancing Model Generalization and Robustness for Lexical Complexity Prediction

Taya¹,

Pereira²,

Cheng³

et al. 2021

Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

Self Cite

View full text Add to dashboard Cite

We propose an ensemble model for predicting the lexical complexity of words and multiword expressions (MWEs). The model receives as input a sentence with a target word or MWE and outputs its complexity score. Given that a key challenge with this task is the limited size of annotated data, our model relies on pretrained contextual representations from different state-of-the-art transformer-based language models (i.e., BERT and RoBERTa), and on a variety of training methods for further enhancing model generalization and robustness: multi-step fine-tuning and multi-task learning, and adversarial training. Additionally, we propose to enrich contextual representations by adding hand-crafted features during training. Our model achieved competitive results and ranked among the top-10 systems in both subtasks.

show abstract

“…In the field of NLP, Miyato et al (2017) have applied adversarial training on text classification tasks and improved the model performance. From then on, many AT methods has been proposed (Wu et al, 2017;Yasunaga et al, 2018;Bekoulis et al, 2018;Zhu et al, 2020;Jiang et al, 2019;Pereira et al, 2020;. They mostly adopt a general AT strategy, but focus less on the adaptation of AT to NLP tasks.…”

Section: Introductionmentioning

confidence: 99%

Adversarial Training for Machine Reading Comprehension with Virtual Embeddings

Yang¹,

Cui²,

Si³

et al. 2021

Proceedings of *SEM 2021: The Tenth Joint Conference on Lexical and Computational Semantics

View full text Add to dashboard Cite

Adversarial training (AT) as a regularization method has proved its effectiveness on various tasks. Though there are successful applications of AT on some NLP tasks, the distinguishing characteristics of NLP tasks have not been exploited. In this paper, we aim to apply AT on machine reading comprehension (MRC) tasks. Furthermore, we adapt AT for MRC tasks by proposing a novel adversarial training method called PQAT that perturbs the embedding matrix instead of word vectors. To differentiate the roles of passages and questions, PQAT uses additional virtual P/Q-embedding matrices to gather the global perturbations of words from passages and questions separately. We test the method on a wide range of MRC tasks, including span-based extractive RC and multiple-choice RC. The results show that adversarial training is effective universally, and PQAT further improves the performance.

show abstract

Adversarial Training for Commonsense Inference

Cited by 19 publications

References 20 publications

Posterior Differential Regularization with f-divergence for Improving Model Robustness

Posterior Differential Regularization with f-divergence for Improving Model Robustness

OCHADAI-KYOTO at SemEval-2021 Task 1: Enhancing Model Generalization and Robustness for Lexical Complexity Prediction

Adversarial Training for Machine Reading Comprehension with Virtual Embeddings

Contact Info

Product

Resources

About