2018
DOI: 10.48550/arxiv.1811.03728
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering

Abstract: While machine learning (ML) models are being increasingly trusted to make decisions in different and varying areas, the safety of systems using such models has become an increasing concern. In particular, ML models are often trained on data from potentially untrustworthy sources, providing adversaries with the opportunity to manipulate them by inserting carefully crafted samples into the training set. Recent work has shown that this type of attack, called a poisoning attack, allows adversaries to insert backdo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
262
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 119 publications
(263 citation statements)
references
References 14 publications
1
262
0
Order By: Relevance
“…For the training-stage defense, a typical method is to leverage the spectral signatures to identify the malicious training examples (Tran et al, 2018;Hayase et al, 2021;Chen et al, 2018), based on the intuition that the representation of a poisoned example exposes a strong signal for the backdoor attack. Hong et al (2020) mitigates poison attacks by debiasing the gradient norm and length at each training step.…”
Section: Defenses Against Backdoor Attacksmentioning
confidence: 99%
See 2 more Smart Citations
“…For the training-stage defense, a typical method is to leverage the spectral signatures to identify the malicious training examples (Tran et al, 2018;Hayase et al, 2021;Chen et al, 2018), based on the intuition that the representation of a poisoned example exposes a strong signal for the backdoor attack. Hong et al (2020) mitigates poison attacks by debiasing the gradient norm and length at each training step.…”
Section: Defenses Against Backdoor Attacksmentioning
confidence: 99%
“…A common practice to find poisoned training data is to treat poisoned data points as outliers and apply outlier detection techniques. For example, Chen et al (2018) clusters intermediate representations (as we call representation-based method here), separating the poisonous from legitimate activations. Tran et al (2018) examines the spectrum of the covariance of a feature representation to detect the special spectrum signatures of malicious data points, gauged by the magnitude in the top PCA direction of that representation.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…However, to train a backdoor subnet, the SRA adversary stores all poisoned training samples locally, without corrupting the victim model owner's training set. So all defenses utilizing the assumption that the training set being poisoned [9,10,14,61,64,65] are rendered ineffective.…”
Section: H Technical Details Of Defensive Analysismentioning
confidence: 99%
“…Such inadvertent presence of minimal unintentional perturbations should not impact the generated recommendations, but do they? Furthermore, in the worst-case, adversarial perturbations can be done by an attacker (e.g., hacker or company insider) who can access the training data [10,30,32]. What is the maximum damage that an attacker can do to the recommender system by introducing minimal adversarial perturbations?…”
Section: Introductionmentioning
confidence: 99%