Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing 2018
DOI: 10.18653/v1/d18-1316
|View full text |Cite
|
Sign up to set email alerts
|

Generating Natural Language Adversarial Examples

Abstract: Deep neural networks (DNNs) are vulnerable to adversarial examples, perturbations to correctly classified examples which can cause the model to misclassify. In the image domain, these perturbations are often virtually indistinguishable to human perception, causing humans and state-of-the-art models to disagree. However, in the natural language domain, small perturbations are clearly perceptible, and the replacement of a single word can drastically alter the semantics of the document. Given these challenges, we… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

3
767
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
4

Relationship

1
7

Authors

Journals

citations
Cited by 668 publications
(770 citation statements)
references
References 32 publications
3
767
0
Order By: Relevance
“…Word substitution perturbations. We base our sets of allowed word substitutions S(x, i) on the substitutions allowed by Alzantot et al (2018). They demonstrated that their substitutions lead to adversarial examples that are qualitatively similar to the original input and retain the original label, as judged by humans.…”
Section: Setupmentioning
confidence: 99%
See 2 more Smart Citations
“…Word substitution perturbations. We base our sets of allowed word substitutions S(x, i) on the substitutions allowed by Alzantot et al (2018). They demonstrated that their substitutions lead to adversarial examples that are qualitatively similar to the original input and retain the original label, as judged by humans.…”
Section: Setupmentioning
confidence: 99%
“…We make three modifications to this approach. First, in Alzantot et al (2018), the adversary applies substitutions one at a time, and the neighborhoods and language model scores are computed relative to the current altered version of the input. This results in a hard-to-define attack surface, as changing one word can allow or disallow changes to other words.…”
Section: Setupmentioning
confidence: 99%
See 1 more Smart Citation
“…Adversarial training is the prevailing counter-measure to build a robust model (Goodfellow et al, 2015;Iyyer et al, 2018;Marzinotto et al, 2019;Cheng et al, 2019; by mixing adversarial examples with the original ones during training the model. However, these adversarial examples can be detected and deactivated by a genetic algorithm (Alzantot et al, 2018). This method also requires retraining, which can be time and cost consuming for large-scale models.…”
Section: Related Workmentioning
confidence: 99%
“…In machine translation, attention learns to align foreign words with their native counterparts (Bahdanau et al, 2015). On the other hand, neural networks often do not behave as humans (Szegedy et al, 2014;Jia and Liang, 2017;Ribeiro et al, 2018;Alzantot et al, 2018). Convolutional networks rely heavily on texture (Geirhos et al, 2019), while humans rely on shape (Landau et al, 1988).…”
Section: Human Evaluation Of Evidencementioning
confidence: 99%