Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics 2023
DOI: 10.18653/v1/2023.eacl-main.216
|View full text |Cite
|
Sign up to set email alerts
|

Learning to Ignore Adversarial Attacks

Yiming Zhang,
Yangqiaoyu Zhou,
Samuel Carton
et al.

Abstract: Despite the strong performance of current NLP models, they can be brittle against adversarial attacks. To enable effective learning against adversarial inputs, we introduce the use of rationale models that can explicitly learn to ignore attack tokens. We find that the rationale models can successfully ignore over 90% of attack tokens. This approach leads to consistent and sizable improvements (∼10%) over baseline models in robustness on three datasets for both BERT and RoBERTa, and also reliably outperforms da… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 29 publications
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?