ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
DOI: 10.1109/icassp.2019.8682430
|View full text |Cite
|
Sign up to set email alerts
|

Universal Adversarial Attacks on Text Classifiers

Abstract: Despite the vast success neural networks have achieved in different application domains, they have been proven to be vulnerable to adversarial perturbations (small changes in the input), which lead them to produce the wrong output. In this paper, we propose a novel method, based on gradient projection, for generating universal adversarial perturbations for text; namely sequence of words that can be added to any input in order to fool the classifier with high probability. We observed that text classifiers are q… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
31
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 74 publications
(35 citation statements)
references
References 14 publications
1
31
0
Order By: Relevance
“…Besides efforts devoted into MRC systems, many efforts are also devoted into adversarial attacking methods on text. (Behjati et al 2019) tried to distract a text classifier by training perturbation embeddings. (Iyyer et al 2018) proposed a syntactically controlled paraphrase networks to generate grammatically adversarial examples.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Besides efforts devoted into MRC systems, many efforts are also devoted into adversarial attacking methods on text. (Behjati et al 2019) tried to distract a text classifier by training perturbation embeddings. (Iyyer et al 2018) proposed a syntactically controlled paraphrase networks to generate grammatically adversarial examples.…”
Section: Related Workmentioning
confidence: 99%
“…Similar to (Behjati et al 2019;Gong et al 2018;Sato et al 2018), our perturbation adversarial training method aims to train a perturbation embedding sequence for each instance under the supervision of target model so as to distract it.…”
Section: Perturbation Embedding Trainingmentioning
confidence: 99%
“…Universal Attacks in NLP: Ribeiro et al (2018) debugged models using semantic-preserving perturbations that forced changes in predictions for downstream tasks such as sentiment analysis, visual QA and machine comprehension. Behjati et al (2019) crafted data-independent adversarial sequences that can fool text classifier when added to any input sample. Alternatively, Wallace et al (2019) study triggers in the form of a word or a few words to analyze models and biases in datasets for LM, text classification.…”
Section: Related Workmentioning
confidence: 99%
“…Contrary to adversarial perturbation, UAP is data-independent and can be added to any input in order to fool the classifier with high confidence. Wallace et al [12] and Behjati et al [13] recently demonstrated a successful universal adversarial attack of the NLP model. In the actual scene, on the one hand, the final reader of the experimental text data is human, so it is a basic requirement to ensure the naturalness of the text; on the other hand, in order to prevent universal adversarial perturbation from being discovered by humans, the naturalness of adversarial perturbation is more important.…”
Section: Introductionmentioning
confidence: 99%