Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua 2021
DOI: 10.18653/v1/2021.naacl-main.400
|View full text |Cite
|
Sign up to set email alerts
|

Contextualized Perturbation for Textual Adversarial Attack

Abstract: Adversarial examples expose the vulnerabilities of natural language processing (NLP) models, and can be used to evaluate and improve their robustness. Existing techniques of generating such examples are typically driven by local heuristic rules that are agnostic to the context, often resulting in unnatural and ungrammatical outputs. This paper presents CLARE, a ContextuaLized AdversaRial Example generation model that produces fluent and grammatical outputs through a mask-then-infill procedure. CLARE builds on … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
54
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 99 publications
(54 citation statements)
references
References 44 publications
0
54
0
Order By: Relevance
“…At the word-level TextFooler ranks the words in a sample by prediction relevance and replaces the most important ones using a word embedding optimized for synonyms (Mrkšić et al, 2016). BERT-Attack (Li et al, 2020b) and CLARE (Li et al, 2021) operate similarly, but they respectively use BERT and DistillRoBERTa (Sanh et al, 2019) (Liu et al, 2019b) as language models to suggest potential candidates. CLARE supports token replacements, insertions, and merges.…”
Section: Attack Strategiesmentioning
confidence: 99%
See 4 more Smart Citations
“…At the word-level TextFooler ranks the words in a sample by prediction relevance and replaces the most important ones using a word embedding optimized for synonyms (Mrkšić et al, 2016). BERT-Attack (Li et al, 2020b) and CLARE (Li et al, 2021) operate similarly, but they respectively use BERT and DistillRoBERTa (Sanh et al, 2019) (Liu et al, 2019b) as language models to suggest potential candidates. CLARE supports token replacements, insertions, and merges.…”
Section: Attack Strategiesmentioning
confidence: 99%
“…One of the first work to use this technique is (Alzantot et al, 2018), in which the authors adversarially train a sentiment classification model on the IMDB dataset without success. Later work, such as (Li et al, 2020b) and (Li et al, 2021) show more interesting results: the former uses adversarial training to make a natural language inference model more robust, gaining 15% after-attack accuracy at the expense of a minimal test accuracy loss. The latter adversarially trains BERT and TextCNN models on the AG news dataset obtaining similar improvements: without loss of test accuracy the authors manage to reduce the attack rate by 12.3% and 3.5% for BERT and TextCNN respectively.…”
Section: Adversarial Trainingmentioning
confidence: 99%
See 3 more Smart Citations