2021
DOI: 10.48550/arxiv.2109.01300
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

How to Inject Backdoors with Better Consistency: Logit Anchoring on Clean Data

Abstract: Since training a large-scale backdoored model from scratch requires a large training dataset, several recent attacks have considered to inject backdoors into a trained clean model without altering model behaviors on the clean data. Previous work finds that backdoors can be injected into a trained clean model with Adversarial Weight Perturbation (AWP). Here AWPs refers to the variations of parameters that are small in backdoor learning. In this work, we observe an interesting phenomenon that the variations of p… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 33 publications
(55 reference statements)
0
1
0
Order By: Relevance
“…Furthermore, Garg et al [53] proposed adding adversarial perturbations to the model parameters of backdoor-injected benign models, showing new security threats using publicly available trained models. Recently, Zhang et al [54] formulated the behavior of preserving benign sample accuracy as the consistency of infection models and provided a theoretical explanation for Adversarial Weight Perturbation (AWP) in backdoor attacks. Based on the analysis, they also introduced a new AWP-based backdoor attack with better global and instance consistency.…”
Section: E Model-based Backdoor Attackmentioning
confidence: 99%
“…Furthermore, Garg et al [53] proposed adding adversarial perturbations to the model parameters of backdoor-injected benign models, showing new security threats using publicly available trained models. Recently, Zhang et al [54] formulated the behavior of preserving benign sample accuracy as the consistency of infection models and provided a theoretical explanation for Adversarial Weight Perturbation (AWP) in backdoor attacks. Based on the analysis, they also introduced a new AWP-based backdoor attack with better global and instance consistency.…”
Section: E Model-based Backdoor Attackmentioning
confidence: 99%