Contextualized Perturbation for Textual Adversarial Attack

Li, Dianqi; Zhang, Yizhe; Peng, Hao; Chen, Liqun; Brockett, Chris; Sun, Ming–Ting; Dolan, Bill

doi:10.18653/v1/2021.naacl-main.400

Cited by 99 publications

(54 citation statements)

References 44 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…At the word-level TextFooler ranks the words in a sample by prediction relevance and replaces the most important ones using a word embedding optimized for synonyms (Mrkšić et al, 2016). BERT-Attack (Li et al, 2020b) and CLARE (Li et al, 2021) operate similarly, but they respectively use BERT and DistillRoBERTa (Sanh et al, 2019) (Liu et al, 2019b) as language models to suggest potential candidates. CLARE supports token replacements, insertions, and merges.…”

Section: Attack Strategiesmentioning

confidence: 99%

“…One of the first work to use this technique is (Alzantot et al, 2018), in which the authors adversarially train a sentiment classification model on the IMDB dataset without success. Later work, such as (Li et al, 2020b) and (Li et al, 2021) show more interesting results: the former uses adversarial training to make a natural language inference model more robust, gaining 15% after-attack accuracy at the expense of a minimal test accuracy loss. The latter adversarially trains BERT and TextCNN models on the AG news dataset obtaining similar improvements: without loss of test accuracy the authors manage to reduce the attack rate by 12.3% and 3.5% for BERT and TextCNN respectively.…”

Section: Adversarial Trainingmentioning

confidence: 99%

“…The attacks are evaluated following previous work (Li et al, 2021), , (Morris et al, 2020), which employ the following automated metrics (in addition to accuracy, recall and F1 score as in the CoNLL2003 task):…”

Section: Evaluation Metricsmentioning

confidence: 99%

“…• Modification Rate (Mod): percentage of modified tokens. Insert operations increase by one the modified tokens count (Li et al, 2021).…”

Section: Evaluation Metricsmentioning

confidence: 99%

“…This shows that deep learning models are fragile and might not be ready for deployment in a critical scenario. The most popular technique to overcome this issue is adversarial training, which uses adversarial attacks to craft additional training samples and retrains the model from scratch (Li et al, 2020b) (Li et al, 2021). Adversarial attacks and training were largely explored with regards to text classification, but current research on NER has only explored attacks based on adversarial typos (Araujo et al, 2020) and the effectiveness of more complex attacks (at the word and sentence levels) is unknown.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

SeqAttack: On Adversarial Attacks for Named Entity Recognition

Simoncini¹,

Spanakis²

2021

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

View full text Add to dashboard Cite

Named Entity Recognition is a fundamental task in information extraction and is an essential element for various Natural Language Processing pipelines. Adversarial attacks have been shown to greatly affect the performance of text classification systems but knowledge about their effectiveness against named entity recognition models is limited. This paper investigates the effectiveness and portability of adversarial attacks from text classification to named entity recognition and the ability of adversarial training to counteract these attacks. We find that character-level and word-level attacks are the most effective, but adversarial training can grant significant protection at little to no expense of standard performance. Alongside our results, we also release SeqAttack, a framework to conduct adversarial attacks against token classification models (used in this work for named entity recognition) and a companion web application to inspect and cherry pick adversarial examples.

show abstract

Section: Attack Strategiesmentioning

confidence: 99%

Section: Adversarial Trainingmentioning

confidence: 99%

Section: Evaluation Metricsmentioning

confidence: 99%

“…• Modification Rate (Mod): percentage of modified tokens. Insert operations increase by one the modified tokens count (Li et al, 2021).…”

Section: Evaluation Metricsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

SeqAttack: On Adversarial Attacks for Named Entity Recognition

Simoncini¹,

Spanakis²

2021

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

View full text Add to dashboard Cite

show abstract

Better constraints of imperceptibility, better adversarial examples in the text

Wang

et al. 2021

Int J of Intelligent Sys

View full text Add to dashboard Cite

State‐of‐the‐art adversarial attacks in the text domain have shown their power to induce machine learning models to produce abnormal outputs. The samples generated in these attacks have three important attributes: attack ability, transferability, and imperceptibility. However, compared with the other two attributes, the imperceptibility of adversarial examples has not been well investigated. Unlike the pixel‐level perturbations in images, adversarial perturbations in the text are usually traceable, reflecting changes in characters, words, or sentences. The generation of imperceptible samples in texts is more difficult than in images. Therefore, how to constrain adversarial perturbations added in the text is a crucial step to construct more natural adversarial texts. Unfortunately, recent studies merely select measurements to constrain the added adversarial perturbations, but none of them explain where these measurements are suitable, which one is better, and how they perform in different kinds of adversarial attacks. In this paper, we fill this gap by comparing the performance of these metrics in various attacks. Furthermore, we propose a stricter constraint for word‐level attacks to obtain more imperceptible samples. It is also helpful to enhance existing word‐level attacks for adversarial training.

show abstract

Textual adversarial attacks by exchanging text‐self words

Liu

et al. 2022

Int J of Intelligent Sys

View full text Add to dashboard Cite

Adversarial attacks expose the vulnerability of deep neural networks. Compared to image adversarial attacks, textual adversarial attacks are more challenging due to the discrete nature of texts. Recent synonym‐based methods achieve the current state‐of‐the‐art results. However, these methods introduce new words against the original text, leading to that humans easily perceive the difference between the adversarial example and the original text. Motivated by the fact that humans are usually unaware of chaotic word order in some cases, we propose exchange‐attack (EA), a concise and effective word‐level textual adversarial attack model. Specifically, the EA model generates adversarial examples by exchanging words of the original text itself according to the contributions that these words make regarding classification results. Intuitively, the smaller the distance between the two exchanged words, the more difficult the chaotic word order to be perceived by humans. We thus take the word distance into consideration when generating the chaotic word orders. Extensive experiments on several text classification data sets show that the EA model consistently outperforms the selected baselines in terms of averaged after‐attack accuracy, modification rate, query number, and semantic similarity. And human evaluation results reveal that humans difficultly perceive the adversarial examples generated by the EA model. In addition, quantitative and qualitative analyses further validate the effectiveness of the EA model, including that the generated adversarial examples are grammatically correct and semantically preserved.

show abstract

Contextualized Perturbation for Textual Adversarial Attack

Cited by 99 publications

References 44 publications

SeqAttack: On Adversarial Attacks for Named Entity Recognition

SeqAttack: On Adversarial Attacks for Named Entity Recognition

Better constraints of imperceptibility, better adversarial examples in the text

Textual adversarial attacks by exchanging text‐self words

Contact Info

Product

Resources

About