2021
DOI: 10.48550/arxiv.2106.06361
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Turn the Combination Lock: Learnable Textual Backdoor Attacks via Word Substitution

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
8
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 10 publications
(9 citation statements)
references
References 35 publications
0
8
0
Order By: Relevance
“…Backdoor attacks can be implemented in several ways, such as by modifying the victim network directly [Gu et al, 2017;Zhang et al, 2021], contaminating the pre-trained network used by the victim [Kurita et al, 2020;Gu et al, 2017], poisoning the training dataset [Yang et al, 2017], or even modifying the training process or loss function [Bagdasaryan and Shmatikov, 2021]. In some cases, a combination of these methods may be used, such as in [Qi et al, 2021], where the poisoned training set and network weights are learned together. A comprehensive review of backdoor attacks against neural networks can be found in [Li et al, 2022].…”
Section: Related Workmentioning
confidence: 99%
“…Backdoor attacks can be implemented in several ways, such as by modifying the victim network directly [Gu et al, 2017;Zhang et al, 2021], contaminating the pre-trained network used by the victim [Kurita et al, 2020;Gu et al, 2017], poisoning the training dataset [Yang et al, 2017], or even modifying the training process or loss function [Bagdasaryan and Shmatikov, 2021]. In some cases, a combination of these methods may be used, such as in [Qi et al, 2021], where the poisoned training set and network weights are learned together. A comprehensive review of backdoor attacks against neural networks can be found in [Li et al, 2022].…”
Section: Related Workmentioning
confidence: 99%
“…Backdoor attacks start to attract lots of attention in NLP and can be classified into two kinds: unstealthy and stealthy attacks. Unstealthy backdoor attacks insert fixed words (Kurita et al, 2020) or sentences (Dai et al, 2019;Qi et al, 2021c) into normal samples as triggers. These triggers are not stealthy because their insertion would significantly decrease sentences' fluency; hence, perplexitybased detection can easily detect and remove such poisoned samples.…”
Section: Backdoor Attackmentioning
confidence: 99%
“…In contrast, stealthy backdoor attacks utilize text style or syntactic as the backdoor trigger, which is more stealthy. Specifically, Qi exploited syntactic structures (Qi et al, 2021b) and style triggers (Qi et al, 2021c) to improve the stealthy backdoor attacks.…”
Section: Backdoor Attackmentioning
confidence: 99%
“…There is another setting for backdoor attacks where the adversary has the full control of the training process and directly distributes the backdoored model. In this case, the backdoor can be embedded by poisoning the model weight (Kurita et al, 2020; or introducing auxiliary task during model training Qi et al, 2021c). Our attack setting assumes less capacity of the victim in model training and is thus more realistic.…”
Section: Related Workmentioning
confidence: 99%