2022
DOI: 10.1609/aaai.v36i10.21362
|View full text |Cite
|
Sign up to set email alerts
|

Improved Text Classification via Contrastive Adversarial Training

Abstract: We propose a simple and general method to regularize the fine-tuning of Transformer-based encoders for text classification tasks. Specifically, during fine-tuning we generate adversarial examples by perturbing the word embedding matrix of the model and perform contrastive learning on clean and adversarial examples in order to teach the model to learn noise-invariant representations. By training on both clean and adversarial examples along with the additional contrastive objective, we observe consistent improve… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
14
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 45 publications
(14 citation statements)
references
References 40 publications
0
14
0
Order By: Relevance
“…A novel technique for regularizing the fine-tuning of Transformer-based encoders for text classification problems is provided in Ref. 18 . The model’s word embedding matrix is perturbed to provide adversarial examples, and contrastive learning is used to train the model to learn noise-invariant representation using clean and adversarial examples.…”
Section: Literature Surveymentioning
confidence: 99%
See 1 more Smart Citation
“…A novel technique for regularizing the fine-tuning of Transformer-based encoders for text classification problems is provided in Ref. 18 . The model’s word embedding matrix is perturbed to provide adversarial examples, and contrastive learning is used to train the model to learn noise-invariant representation using clean and adversarial examples.…”
Section: Literature Surveymentioning
confidence: 99%
“…AraBERTv2 achieved the highest accuracy, precision, and f1-score of 97% on the description dataset among all other transformer architectures It is only restricted to Arabic news content 16 2022 Pre-trained BERT model Detect fake news in a region-based distributed approach FakeNewsNet dataset Precision, recall, and F1-score The model achieved an accuracy of 91% The distributed framework further needs to be optimized in the mobile crowdsensing environment 17 2021 A hybrid model combining RNN with Bidirectional GRU and SVM Identify real and fake news FakeNewsNet dataset Accuracy, precision, recall, and F1-score Suggested methodology performed better than cutting-edge techniques The limitation of Support Vector Machines (SVM) is that their performance depends on the feature vector's size. In this case, the minimum size of the feature vector was restricted to 512 units, which is the output of the GRUs 18 2022 Perturbation of word embedding matrix and contrastive learning using transformers such as BERT and RoBERTa GLUE benchmark tasks and three intent classification datasets Accuracy The method demonstrates an improvement of 1.7% on average over BERTLarge and 1.3% over RoBERTaLarge. On intent classification tasks, the fine-tuned RoBERTaLarge outperforms the RoBERTaLarge baseline by 1% on the entire test sets and 2% on the more challenging test sets Regularizing Transformer-based encoders for text classification problems Modest perturbations to input vector entries may not be appropriate for sparse high-dimensional inputs 19 2022 CRAL (consistent regularization for adaptation learning) and VAT (virtual adversarial training) with entropy minimization Two MDTC (multi-domain text classification) benchmarks Accuracy 88% and 90% on both datasets Adversarial training for specific domain adaptation Accuracy is compromised in an unseen domain …”
Section: Literature Surveymentioning
confidence: 99%
“…In particular, because of the convergence of models such as BERT [18], XLNet [34], RoBERTa [13], T5 [15], and ELECTRA [35], research has been conducted to apply AT to the fine-tuning process of pre-trained language models. AT has been shown to improve the performance of text classification tasks [36], and AT is also effective for fine-tuning and pre-training language models [37]- [39]. It has been empirically demonstrated that AT is effective when applied to a BERT model [40].…”
Section: B Adversarial Trainingmentioning
confidence: 99%
“…Their research shows that Roberta Large also performs 1-2% better than the Roberta Large baseline. In reference 17 , the authors propose a novel method called CRAL (Consistent Regularization for Adaptation Learning) for performing domain adaptation. The approach involves creating two distinct shared latent spaces, conducting domain alignment for each space, and penalizing any inconsistencies between the two alignments in terms of predictions for unlabeled data.…”
Section: Adversarial Trainingmentioning
confidence: 99%