2021
DOI: 10.48550/arxiv.2103.11576
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Grey-box Adversarial Attack And Defence For Sentiment Classification

Abstract: We introduce a grey-box adversarial attack and defence framework for sentiment classification. We address the issues of differentiability, label preservation and input reconstruction for adversarial attack and defence in one unified framework. Our results show that once trained, the attacking model is capable of generating high-quality adversarial examples substantially faster (one order of magnitude less in time) than state-of-the-art attacking methods. These examples also preserve the original sentiment acco… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
7
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 8 publications
(7 citation statements)
references
References 26 publications
0
7
0
Order By: Relevance
“…They used FGSM to produce perturbations in word embedding and used nearest neighbor search to find the closest words. However, this approach treated all tokens as equally vulnerable and replace all tokens with their nearest neighbors, which led to non-sensical, word-salad outputs [76]. A number of works [32,33,42,62] utilized the ideas of JSMA (e.g., find important pixel) to solve this problem.…”
Section: Adversarial Operationmentioning
confidence: 99%
“…They used FGSM to produce perturbations in word embedding and used nearest neighbor search to find the closest words. However, this approach treated all tokens as equally vulnerable and replace all tokens with their nearest neighbors, which led to non-sensical, word-salad outputs [76]. A number of works [32,33,42,62] utilized the ideas of JSMA (e.g., find important pixel) to solve this problem.…”
Section: Adversarial Operationmentioning
confidence: 99%
“…In the white-box approach, the target model is known to the adversary. In a blackbox approach, the target model is not known whereas, in a grey-box attack, an adversary is assumed to know the architecture of the target model, but have no access to the weights in the network [24,25]. Among these attacks, the white-box attack is generally the strongest form of attack.…”
Section: Adversarial Attacksmentioning
confidence: 99%
“…All the above discussed methods used the generated adversarial examples for the adversarial training in order to defend the model from attacks and increase robustness. In [120] authors proposed a grey box adversarial attack with the help of a generator model for sentiment analysis task. Adversarial training is done augmenting the data using a static copy mask mechanism in generator, where data is augmented only at the positions which are not masked.…”
Section: Adversarial Training Based Defensesmentioning
confidence: 99%