Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security 2021
DOI: 10.1145/3460120.3485370
|View full text |Cite
|
Sign up to set email alerts
|

Backdoor Pre-trained Models Can Transfer to All

Abstract: Pre-trained general-purpose language models have been a dominating component in enabling real-world natural language processing (NLP) applications. However, a pre-trained model with backdoor can be a severe threat to the applications. Most existing backdoor attacks in NLP are conducted in the fine-tuning phase by introducing malicious triggers in the targeted class, thus relying greatly on the prior knowledge of the fine-tuning task. In this paper, we propose a new approach to map the inputs containing trigger… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
47
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 45 publications
(47 citation statements)
references
References 41 publications
0
47
0
Order By: Relevance
“…Backdoor attacks have also been successfully applied to BMs on NLP tasks. For example, [1035] and [1036] simultaneously propose backdoor attacks on pre-trained NLP models, such that the downstream tasks after fine-tuning can also inherent the backdoor behavior. [1037] propose a weight poisoning approach that the pre-trained weights are injected with vulnerabilities which expose backdoors after fine-tuning.…”
Section: Threats For Big Modelsmentioning
confidence: 99%
See 1 more Smart Citation
“…Backdoor attacks have also been successfully applied to BMs on NLP tasks. For example, [1035] and [1036] simultaneously propose backdoor attacks on pre-trained NLP models, such that the downstream tasks after fine-tuning can also inherent the backdoor behavior. [1037] propose a weight poisoning approach that the pre-trained weights are injected with vulnerabilities which expose backdoors after fine-tuning.…”
Section: Threats For Big Modelsmentioning
confidence: 99%
“…Recent studies demonstrated adding a few specially-crafted data to the training corpus can manipulate the model, e.g., generating offensive text [1092,1016], wrong translations [1093] or suggesting insecure code [1090]. Moreover, the backdoors in the pretrained language models can impact a wide range of downstream tasks [1035]. It makes the pretrained model a single point of failure for all downstream applications.…”
Section: Securitymentioning
confidence: 99%
“…Triggers are attacker-specific patterns that activate backdoors. Most backdoor triggers are fixed words [11,15,33,38,44] or sentences [6]. To make triggers invisible, some attackers design syntactic [26] or style [25] triggers, where backdoors activate when input texts of certain syntax or style.…”
Section: Attackmentioning
confidence: 99%
“…Users get a model specifically trained for the task and further tune it on clean datasets. Finally, with the weak assumption that no task-specific knowledge is available, some attackers use plain texts to attack general-purpose PLMs and leave the backdoor to arbitrary downstream tasks [44,33].…”
Section: Accessibilitymentioning
confidence: 99%
“…Therefore, understanding the deciding factors of transferability and their working mechanisms has attracted intensive researches [15], [19], [29], [34], [36], [39], [44]. However, to the best of our knowledge, all systematic empirical studies are conducted under controlled "lab" environments that are too ideal to make the derived conclusions reliable in the real environment.…”
Section: Introductionmentioning
confidence: 99%