Simtrojan: Stealthy Backdoor Attack

Ren, Yankun; Li, Longfei; Zhou, Jun

doi:10.1109/icip42928.2021.9506313

Cited by 14 publications

(8 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Ideally, we hope a perfect adaptive attack can make the poison and clean samples completely indistinguishable. This has been achieved under stronger threat model when the the training process is also controlled [27,35,7,25,5,37]. In this paper, we take a step further to this goal under poisoning-only threat model.…”

Section: Discussionmentioning

confidence: 94%

“…However, this work assumes a much stronger threat model where adversaries not only control the training data but also control the whole training process -thus they can directly encode the latent indistinguishability requirement into the training objectives of the attacked models. Several more recent work [35,7,25,5,37] that also study this problem all follow the same threat model to Shokri et al [27]. Perhaps, a more relevant work is Tang et al [31], which points out that their source-specific poisoning attack (see Figure 1e) can reduce latent separability.…”

Section: Background and Related Workmentioning

confidence: 97%

“…Following the widely observed empirical evidences for latent separability assumption, a natural question to ask is: Is the latent separability between poison and clean populations really an unavoidable consequence of backdoor poisoning attacks? Several recent work [27,35,7,25,5,37] claim that the latent separability can indeed be suppressed, and state-of-the-art defenses can thus be bypassed. However, these work make a very strong assumption on the control of the whole training process of attacked models, which is far beyond the standard threat model of backdoor poisoning attacks that only allows data poisoning.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Circumventing Backdoor Defenses That Are Based on Latent Separability

Qi¹,

Xie²,

Mahloujifar³

et al. 2022

Preprint

View full text Add to dashboard Cite

Deep learning models are vulnerable to backdoor poisoning attacks. In particular, adversaries can embed hidden backdoors into a model by only modifying a very small portion of its training data. On the other hand, it has also been commonly observed that backdoor poisoning attacks tend to leave a tangible signature in the latent space of the backdoored model (i.e. poison samples and clean samples form two separable clusters in the latent space). These observations give rise to the popularity of latent separability assumption, which states that the backdoored DNN models will learn separable latent representations for poison and clean populations. A number of popular defenses (e.g. Spectral Signature, Activation Clustering, SCAn, etc.) are exactly built upon this assumption. However, in this paper, we show that the latent separation can be significantly suppressed via designing adaptive backdoor poisoning attacks with more sophisticated poison strategies, which consequently render state-of-the-art defenses based on this assumption less effective (and often completely fail). More interestingly, we find that our adaptive attacks can even evade some other typical backdoor defenses that do not explicitly build on this separability assumption. Our results show that adaptive backdoor poisoning attacks that can breach the latent separability assumption should be seriously considered for evaluating existing and future defenses.

show abstract

Section: Discussionmentioning

confidence: 94%

Section: Background and Related Workmentioning

confidence: 97%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Circumventing Backdoor Defenses That Are Based on Latent Separability

Qi¹,

Xie²,

Mahloujifar³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Answers to this question depend on specific threat models and defensive settings we consider. Under a strong threat model where adversaries can fully control the training process, a series of recent work [36,48,9,32,5,52] show that the latent representations of poison and clean samples can be made indistinguishable by explicitly encoding the indistinguishability objective into the training loss of the backdoored model.…”

Section: B2 Adaptive Backdoor Poisoning Attacksmentioning

confidence: 99%

Towards A Proactive ML Approach for Detecting Backdoor Poison Samples

Qi¹,

Xie²,

Mahloujifar³

et al. 2022

Preprint

View full text Add to dashboard Cite

Deep learning suffers from backdoor poisoning attacks. By injecting only a few poison samples into a training set, adversaries can easily embed stealthy backdoor into the trained models. In this work, we study poison samples detection for defending against backdoor poisoning attacks on deep neural networks (DNNs). A principled idea underlying prior arts on this problem is to utilize the backdoored models' distinguishable behaviors on poison and clean populations to distinguish between these two different populations themselves and remove the identified poison. Typically, many prior arts build their detectors upon a latent separability assumption, which states that backdoored models trained on the poisoned dataset will learn separable latent representations for backdoor and clean samples. Although such separation behaviors empirically exist for many existing attacks, there is no control on the separability and the extent of separation can vary a lot across different poison strategies, datasets, as well as the training configurations of backdoored models. Worse still, recent adaptive poison strategies can greatly reduce the "distinguishable behaviors" and consequently render most prior arts less effective (or completely fail). We point out that these limitations directly come from the passive reliance on some distinguishable behaviors that are not controlled by defenders. To mitigate such limitations, in this work, we propose the idea of active defense -rather than passively assuming backdoored models will have certain distinguishable behaviors on poison and clean samples, we propose to actively enforce the trained models to behave differently on these two different populations. Specifically, we introduce confusion training as a concrete instance of active defense. Confusion training separates poison and clean populations by introducing another poisoning attack to the already poisoned dataset, which actively decouples the benign correlations and leave backdoor correlations the only learnable patterns -consequently, only backdoor poison samples can be fitted, while clean samples are underfitted. In short, we literally invite a "defensive poison" in to fight the original backdoor poison we aim to cleanse. By extensive evaluations on both CIFAR10 and GTSRB, we show superiority of active defense across a diverse set of backdoor poisoning attacks.Preprint. Under review.

show abstract

“…It is worth noting that GRASP represents a different type of backdoor attack compared with the stealthy backdoors proposed recently (e.g., [12] [13]). Existing stealthy backdoor methods attempt to devise specific triggers, often dependent on the target neural network model so that they are hard to detect and mitigate by defense methods.…”

Section: Introductionmentioning

confidence: 99%

Gradient Shaping: Enhancing Backdoor Attack Against Reverse Engineering

Zhu¹,

Tang²,

Tang³

et al. 2023

Preprint

View full text Add to dashboard Cite

Most existing methods to detect backdoored machine learning (ML) models take one of the two approaches: trigger inversion (aka. reverse engineer) and weight analysis (aka. model diagnosis). In particular, the gradientbased trigger inversion is considered to be among the most effective backdoor detection techniques, as evidenced by the TrojAI competition [1], Trojan Detection Challenge [2] and backdoorBench [3]. However, little has been done to understand why this technique works so well and, more importantly, whether it raises the bar to the backdoor attack. In this paper, we report the first attempt to answer this question by analyzing the change rate of the backdoored model around its trigger-carrying inputs. Our study shows that existing attacks tend to inject the backdoor characterized by a low change rate around trigger-carrying inputs, which are easy to capture by gradient-based trigger inversion. In the meantime, we found that the low change rate is not necessary for a backdoor attack to succeed: we design a new attack enhancement called Gradient Shaping (GRASP), which follows the opposite direction of adversarial training to reduce the change rate of a backdoored model with regard to the trigger, without undermining its backdoor effect. Also, we provide a theoretic analysis to explain the effectiveness of this new technique and the fundamental weakness of gradient-based trigger inversion. Finally, we perform both theoretical and experimental analysis, showing that the GRASP enhancement does not reduce the effectiveness of the stealthy attacks against the backdoor detection methods based on weight analysis, as well as other backdoor mitigation methods without using detection.

show abstract

Simtrojan: Stealthy Backdoor Attack

Cited by 14 publications

References 9 publications

Circumventing Backdoor Defenses That Are Based on Latent Separability

Circumventing Backdoor Defenses That Are Based on Latent Separability

Towards A Proactive ML Approach for Detecting Backdoor Poison Samples

Gradient Shaping: Enhancing Backdoor Attack Against Reverse Engineering

Contact Info

Product

Resources

About