Annual Computer Security Applications Conference 2021
DOI: 10.1145/3485832.3485837
|View full text |Cite
|
Sign up to set email alerts
|

BadNL: Backdoor Attacks against NLP Models with Semantic-preserving Improvements

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
79
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
4

Relationship

3
6

Authors

Journals

citations
Cited by 102 publications
(79 citation statements)
references
References 25 publications
0
79
0
Order By: Relevance
“…Similar to us, Qi et al [25] measured perplexity and grammar errors of poisoned samples. Besides, some works [27,4] incorporated human evaluation to identify poisoned samples. While being convincing, it is impossible to check every sentence manually in practice.…”
Section: A Further Discussion Of Evaluation Methodologiesmentioning
confidence: 99%
See 1 more Smart Citation
“…Similar to us, Qi et al [25] measured perplexity and grammar errors of poisoned samples. Besides, some works [27,4] incorporated human evaluation to identify poisoned samples. While being convincing, it is impossible to check every sentence manually in practice.…”
Section: A Further Discussion Of Evaluation Methodologiesmentioning
confidence: 99%
“…Few works are talking about validity in textual backdoor learning. Chen et al [4] looked into this issue and used Sentence-BERT [29] for sentence similarity calculation. Borrowing the idea from adversarial NLP, we choose the widely-adopted USE [2] as validity proxy [16,41,13].…”
Section: A Further Discussion Of Evaluation Methodologiesmentioning
confidence: 99%
“…If the victim model is an encoder, the response can be the feature vector. A successful model stealing attack may not only breach the intellectual property of the victim model but also serve as a springboard for further attacks such as membership inference attacks [20,21,31,32,42,44,45], backdoor attacks [11,25,41,51] and adversarial examples [6,16,28,34,37]. Previous work has demonstrated that neural networks are vulnerable to model stealing attacks.…”
Section: Model Stealing Attacksmentioning
confidence: 99%
“…This attack simplifies the assumptions of the backdoor attack by not assuming the knowledge of any sample from the distribution of the target model's training dataset. There also exist multiple backdoor attacks attacking against Natural Language Processing (NLP) models [6], federated learning [45], video recognition [52], transfer Learning [47], and others [17,24,25,38] The backdoor attack can be considered a specific instance of the model hijacking attack by considering the classification of the backdoored samples as the hijacking dataset. However, our model hijacking attack is more general, i.e., it poisons the model to implement a completely different task.…”
Section: Training Time Attacksmentioning
confidence: 99%