2021
DOI: 10.1109/access.2021.3101289
|View full text |Cite
|
Sign up to set email alerts
|

Cassandra: Detecting Trojaned Networks From Adversarial Perturbations

Abstract: Deep neural networks are being widely deployed for critical tasks. In many cases, pretrained models are sourced from vendors who may have disrupted the training pipeline to insert Trojan behaviors. These malicious behaviors can be triggered at the adversary's will, which is a serious security threat. To verify the integrity of a deep model, we propose a method that captures its fingerprint with adversarial perturbations. Inserting backdoors into a network alters its decision boundaries which are effectively en… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 10 publications
(4 citation statements)
references
References 21 publications
0
4
0
Order By: Relevance
“…While recent work has shown that feature-space UAPs can be employed in attacks at training time, such as backdoor poisoning attacks [77], here we focus solely on the test phase of the machine learning pipeline. Specifically, we focus on evasion attacks ( §2.1) in which the attacker modifies objects at test-time in order to induce targeted misclassifications.…”
Section: Attack Scope and Objectivesmentioning
confidence: 99%
“…While recent work has shown that feature-space UAPs can be employed in attacks at training time, such as backdoor poisoning attacks [77], here we focus solely on the test phase of the machine learning pipeline. Specifically, we focus on evasion attacks ( §2.1) in which the attacker modifies objects at test-time in order to induce targeted misclassifications.…”
Section: Attack Scope and Objectivesmentioning
confidence: 99%
“…In Figure 3a, BadNets (No AWP) can be easily detected by the noise response method, while models with AWPs cannot be detected. Zhang et al (2020) propose to detect backdoors via targeted Universal Adversarial Perturbation (UAP) (Moosavi-Dezfooli et al, 2017), with the backdoor target as the target label. From Figure 3b, we can rank the detection difficulty as follows: Anchoring (AWP) > BadNets (AWP) > BadNets (No AWP).…”
Section: Backdoor Detection and Mitigationmentioning
confidence: 99%
“…Backdoor detection (Huang et al, 2020;Harikumar et al, 2020;Kwon, 2020;Zhang et al, 2020;Erichson et al, 2020) methods or backdoor mitigation methods (Yao et al, 2019;Zhao et al, 2020;Liu et al, 2018a) can be utilized to defend against backdoors. Backdoor detection methods usually identify the existence of backdoors in the model, via the responses of the model to input noises (Erichson et al, 2020) or universal adversarial perturbations (Zhang et al, 2020).…”
Section: Backdoor Attackmentioning
confidence: 99%
See 1 more Smart Citation