Black-box Detection of Backdoor Attacks with Limited Information and Data

Dong, Yinpeng; Yang, Xiao; Deng, Zhongliang; Pang, Tianyu; Xiao, Zihao; Su, Hang; Zhu, Jun

doi:10.48550/arxiv.2103.13127

Cited by 4 publications

(17 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Consistent with prior studies (Dong et al, 2021;Kolouri et al, 2020;, we deem a DNN is backdoor infected if one can make an arbitrary input misclassified as the target label, with minor modification to the input. Without loss of generability, given the original input x ∈ R n , the modified input containing the backdoor trigger can be formulated as:…”

Section: Problem Definitionmentioning

confidence: 93%

“…However, their method still need the DNN's parameters to train a separate generator (Goodfellow et al, 2014). So, strictly speaking, their method is not "black-box", which is also revealed by (Dong et al, 2021). To the best of our knowledge, Dong et al (2021) is the only existing work on detecting backdoor-infected DNNs in the black-box settings.…”

Section: Related Workmentioning

confidence: 99%

“…So, strictly speaking, their method is not "black-box", which is also revealed by (Dong et al, 2021). To the best of our knowledge, Dong et al (2021) is the only existing work on detecting backdoor-infected DNNs in the black-box settings. However, their method requires the predictive confidence score for each input to perform the NES algorithm (Wierstra et al, 2014), which weakens its practicability.…”

Section: Related Workmentioning

confidence: 99%

“…There has been a significant amount of recent work on detecting the backdoor triggers. However, those solutions require access to the original poisoned training data (Chen et al, 2018;Tran et al, 2018), the parameters of the trained model (Chen et al, 2019a;Guo et al, 2019;Liu et al, 2019;Dong et al, 2021;Kolouri et al, 2020), or the predicted confidence score of each class (Dong et al, 2021). Unfortunately, it is costly and often impractical for the defender to access the original poisoned training dataset.…”

Section: Introductionmentioning

confidence: 99%

“…Inspired by the Univariate theory, we propose a global adversarial peak (GAP) value by sampling multiple examples and choosing the maximum over their adversarial peaks, to ensure a high success rate. Following previous works (Dong et al, 2021;, the Median Absolute Deviation (MAD) algorithm is implemented on top of the GAP values to test whether a DNN is backdoor-infected.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

AEVA: Black-box Backdoor Detection Using Adversarial Extreme Value Analysis

Guo¹,

Li²,

Liu³

2021

Preprint

View full text Add to dashboard Cite

Deep neural networks (DNNs) are proved to be vulnerable against backdoor attacks. A backdoor is often embedded in the target DNNs through injecting a backdoor trigger into training examples, which can cause the target DNNs misclassify an input attached with the backdoor trigger. Existing backdoor detection methods often require the access to the original poisoned training data, the parameters of the target DNNs, or the predictive confidence for each given input, which are impractical in many real-world applications, e.g., on-device deployed DNNs. We address the black-box hard-label backdoor detection problem where the DNN is fully black-box and only its final output label is accessible. We approach this problem from the optimization perspective and show that the objective of backdoor detection is bounded by an adversarial objective. Further theoretical and empirical studies reveal that this adversarial objective leads to a solution with highly skewed distribution; a singularity is often observed in the adversarial map of a backdoorinfected example, which we call the adversarial singularity phenomenon. Based on this observation, we propose the adversarial extreme value analysis (AEVA) to detect backdoors in black-box neural networks. AEVA is based on an extreme value analysis of the adversarial map, computed from the monte-carlo gradient estimation. Evidenced by extensive experiments across multiple popular tasks and backdoor attacks, our approach is shown effective in detecting backdoor attacks under the black-box hard-label scenarios.

show abstract

Section: Problem Definitionmentioning

confidence: 93%

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

AEVA: Black-box Backdoor Detection Using Adversarial Extreme Value Analysis

Guo¹,

Li²,

Liu³

2021

Preprint

View full text Add to dashboard Cite

show abstract

Accumulative Poisoning Attacks on Real-time Data

Pang¹,

Yang²,

Dong³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Collecting training data from untrusted sources exposes machine learning services to poisoning adversaries, who maliciously manipulate training data to degrade the model accuracy. When trained on offline datasets, poisoning adversaries have to inject the poisoned data in advance before training, and the order of feeding these poisoned batches into the model is stochastic. In contrast, practical systems are more usually trained/fine-tuned on sequentially captured real-time data, in which case poisoning adversaries could dynamically poison each data batch according to the current model state. In this paper, we focus on the real-time settings and propose a new attacking strategy, which affiliates an accumulative phase with poisoning attacks to secretly (i.e., without affecting accuracy) magnify the destructive effect of a (poisoned) trigger batch. By mimicking online learning and federated learning on CIFAR-10, we show that the model accuracy will significantly drop by a single update step on the trigger batch after the accumulative phase. Our work validates that a well-designed but straightforward attacking strategy can dramatically amplify the poisoning effects, with no need to explore complex techniques. * Equal contribution Preprint. Under review.

show abstract

Towards Effective and Robust Neural Trojan Defenses via Input Filtering

Do¹,

Harikumar²,

Lê³

et al. 2022

Preprint

View full text Add to dashboard Cite

Trojan attacks on deep neural networks are both dangerous and surreptitious. Over the past few years, Trojan attacks have advanced from using only a simple trigger and targeting only one class to using many sophisticated triggers and targeting multiple classes. However, Trojan defenses have not caught up with this development. Most defense methods still make out-of-date assumptions about Trojan triggers and target classes, thus, can be easily circumvented by modern Trojan attacks. In this paper, we advocate general defenses that are effective and robust against various Trojan attacks and propose two novel "filtering" defenses with these characteristics called Variational Input Filtering (VIF) and Adversarial Input Filtering (AIF). VIF and AIF leverage variational inference and adversarial training respectively to purify all potential Trojan triggers in the input at run time without making any assumption about their numbers and forms. We further extend "filtering" to "filtering-thencontrasting" -a new defense mechanism that helps avoid the drop in classification accuracy on clean data caused by filtering. Extensive experimental results show that our proposed defenses significantly outperform 4 well-known defenses in mitigating 5 different Trojan attacks including the two state-of-the-art which defeat many strong defenses.

show abstract

Black-box Detection of Backdoor Attacks with Limited Information and Data

Cited by 4 publications

References 14 publications

AEVA: Black-box Backdoor Detection Using Adversarial Extreme Value Analysis

AEVA: Black-box Backdoor Detection Using Adversarial Extreme Value Analysis

Accumulative Poisoning Attacks on Real-time Data

Towards Effective and Robust Neural Trojan Defenses via Input Filtering

Contact Info

Product

Resources

About