Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering

Chen, Bryant; Carvalho, Wilka; Baracaldo, Nathalie; Ludwig, Heiko; Edwards, Benjamin R.; Lee, Tae-Sung; Molloy, Ian; Srivastava, Biplav

doi:10.48550/arxiv.1811.03728

Cited by 119 publications

(263 citation statements)

References 14 publications

Supporting

Mentioning

262

Contrasting

Order By: Relevance

“…For the training-stage defense, a typical method is to leverage the spectral signatures to identify the malicious training examples (Tran et al, 2018;Hayase et al, 2021;Chen et al, 2018), based on the intuition that the representation of a poisoned example exposes a strong signal for the backdoor attack. Hong et al (2020) mitigates poison attacks by debiasing the gradient norm and length at each training step.…”

Section: Defenses Against Backdoor Attacksmentioning

confidence: 99%

“…A common practice to find poisoned training data is to treat poisoned data points as outliers and apply outlier detection techniques. For example, Chen et al (2018) clusters intermediate representations (as we call representation-based method here), separating the poisonous from legitimate activations. Tran et al (2018) examines the spectrum of the covariance of a feature representation to detect the special spectrum signatures of malicious data points, gauged by the magnitude in the top PCA direction of that representation.…”

Section: Introductionmentioning

confidence: 99%

“…: the vanilla spectrum based method using PCA to detect poisoned data; (2) Clustering(Chen et al, 2018): clustering intermediate representations to separate the poisonous from legitimate data; (3) SPECTRA(Hayase et al, 2021): extending upon PCA by whitening the representations to amplify the spectrum signals; and (4) COSIN(Hammoudeh & Lowd, 2021): iteratively updating the influence of each data point using cached training checkpoints. The models are first trained on the hybrid dataset and the attack success rate Att suc is reported.…”

mentioning

confidence: 99%

See 2 more Smart Citations

A General Framework for Defending Against Backdoor Attacks via Influence Graph

Sun¹,

Li²,

Li³

et al. 2021

Preprint

View full text Add to dashboard Cite

In this work, we propose a new and general framework to defend against backdoor attacks, inspired by the fact that attack triggers usually follow a SPECIFIC type of attacking pattern, and therefore, poisoned training examples have greater impacts on each other during training. We introduce the notion of the influence graph, which consists of nodes and edges respectively representative of individual training points and associated pair-wise influences. The influence between a pair of training points represents the impact of removing one training point on the prediction of another, approximated by the influence function (Koh & Liang, 2017). Malicious training points are extracted by finding the maximum average sub-graph subject to a particular size. Extensive experiments on computer vision and natural language processing tasks demonstrate the effectiveness and generality of the proposed framework.

show abstract

Section: Defenses Against Backdoor Attacksmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

mentioning

confidence: 99%

See 1 more Smart Citation

A General Framework for Defending Against Backdoor Attacks via Influence Graph

Sun¹,

Li²,

Li³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…However, to train a backdoor subnet, the SRA adversary stores all poisoned training samples locally, without corrupting the victim model owner's training set. So all defenses utilizing the assumption that the training set being poisoned [9,10,14,61,64,65] are rendered ineffective.…”

Section: H Technical Details Of Defensive Analysismentioning

confidence: 99%

Towards Practical Deployment-Stage Backdoor Attack on Deep Neural Networks

Xie

Pan

et al. 2021

Preprint

View full text Add to dashboard Cite

One major goal of the AI security community is to securely and reliably produce and deploy deep learning models for real-world applications. To this end, data poisoning based backdoor attacks on deep neural networks (DNNs) in the production stage (or training stage) and corresponding defenses are extensively explored in recent years. Ironically, backdoor attacks in the deployment stage, which can often happen in unprofessional users' devices and are thus arguably far more threatening in real-world scenarios, draw much less attention of the community. We attribute this imbalance of vigilance to the weak practicality of existing deployment-stage backdoor attack algorithms and the insufficiency of real-world attack demonstrations. To fill the blank, in this work, we study the realistic threat of deployment-stage backdoor attacks on DNNs. We base our study on a commonly used deployment-stage attack paradigm -adversarial weight attack, where adversaries selectively modify model weights to embed backdoor into deployed DNNs. To approach realistic practicality, we propose the first gray-box and physically realizable weights attack algorithm for backdoor injection, namely subnet replacement attack (SRA), which only requires architecture information of the victim model and can support physical triggers in the real world. Extensive experimental simulations and system-level real-world attack demonstrations are conducted. Our results not only suggest the effectiveness and practicality of the proposed attack algorithm, but also reveal the practical risk of a novel type of computer virus that may widely spread and stealthily inject backdoor into DNN models in user devices. By our study, we call for more attention to the vulnerability of DNNs in the deployment stage.

show abstract

“…Such inadvertent presence of minimal unintentional perturbations should not impact the generated recommendations, but do they? Furthermore, in the worst-case, adversarial perturbations can be done by an attacker (e.g., hacker or company insider) who can access the training data [10,30,32]. What is the maximum damage that an attacker can do to the recommender system by introducing minimal adversarial perturbations?…”

Section: Introductionmentioning

confidence: 99%

Rank List Sensitivity of Recommender Systems to Interaction Perturbations

Oh¹,

Kumar²

2022

Preprint

View full text Add to dashboard Cite

While deep learning-based sequential recommender systems are widely used in practice, their sensitivity to untargeted training data perturbations is unknown. Untargeted perturbations aim to modify ranked recommendation lists for all users at test time, by inserting imperceptible input perturbations during training time. Existing perturbation methods are mostly targeted attacks optimized to change ranks of target items, but not suitable for untargeted scenarios. In this paper, we develop a novel framework in which user-item training interactions are perturbed in unintentional and adversarial settings. First, through comprehensive experiments on four datasets, we show that four popular recommender models are unstable against even one random perturbation. Second, we establish a cascading effect in which minor manipulations of early training interactions can cause extensive changes to the model and the generated recommendations for all users. Leveraging this effect, we propose an adversarial perturbation method CASPER which identifies and perturbs an interaction that induces the maximal cascading effect. Experimentally, we demonstrate that CASPER reduces the stability of recommendation models the most, compared to several baselines and state-of-the-art methods. Finally, we show the runtime and success of CASPER scale near-linearly with the dataset size and the number of perturbations, respectively.

show abstract

Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering

Cited by 119 publications

References 14 publications

A General Framework for Defending Against Backdoor Attacks via Influence Graph

A General Framework for Defending Against Backdoor Attacks via Influence Graph

Towards Practical Deployment-Stage Backdoor Attack on Deep Neural Networks

Rank List Sensitivity of Recommender Systems to Interaction Perturbations

Contact Info

Product

Resources

About