Trojaning Attack on Neural Networks

Liu, Yingqi; Ma, Shiqing; Aafer, Yousra; Lee, Wen‐Chuan; Zhai, Jun; Wang, Weihang; Zhang, Xiangyu

doi:10.14722/ndss.2018.23291

Cited by 917 publications

(1,003 citation statements)

References 32 publications

Supporting

Mentioning

950

Contrasting

Order By: Relevance

“…TTE attacks, which require knowledge of the classifier [21] [19][4] [16], they are a serious practical threat to the integrity of deployed machine learning solutions. Like many existing works, we focus here on DNN image classifiers for convenience, although backdoor attacks are also studied in other domains such as speech recognition [13].…”

Section: Introductionmentioning

confidence: 99%

Revealing Backdoors, Post-Training, in DNN Classifiers via Novel Inference on Optimized Perturbations Inducing Group Misclassification

Xiang

Miller²,

Kesidis³

2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

With the wide deployment of deep neural network (DNN) classifiers, there is great potential for harm from adversarial learning attacks. Recently, a special type of data poisoning (DP) attack, known as a backdoor, was proposed. These attacks do not seek to degrade classification accuracy, but rather to have the classifier learn to classify to a target class whenever the backdoor pattern is present in a test example. Launching backdoor attacks does not require knowledge of the classifier or its training process -it only needs the ability to poison the training set with (a sufficient number of) exemplars containing a sufficiently strong backdoor pattern (labeled with the target class). Defenses against backdoor DP attacks can be deployed before/during training, post-training, or inflight, i.e. during classifier operation/test time. Here, we address post-training detection of backdoor attacks in DNN image classifiers, seldom considered in existing works, wherein the defender does not have access to the poisoned training set, but only to the trained classifier itself, as well as to clean (unpoisoned) examples from the classification domain. This scenario is of great interest because a trained classifier may be the basis of e.g. a phone app that will be shared with many users. Detecting backdoors post-training may thus reveal a widespread attack. We propose a purely unsupervised anomaly detection (AD) defense against imperceptible backdoor attacks that: i) detects whether the trained DNN has been backdoor-attacked; ii) infers the source and target classes involved in a detected attack;iii) we even demonstrate it is possible to accurately estimate the backdoor pattern. Our AD approach involves learning (via suitable cost function minimization) the minimum size perturbation (putative backdoor) required to induce the classifier to misclassify (most) examples from class s to class t, for all (s, t) pairs. Our hypothesis is that non-attacked pairs require large perturbations, while attacked pairs require much smaller ones. This is convincingly borne out experimentally. We identify a variety of plausible cost functions and devise a novel, robust hypothesis testing approach to perform detection inference. We test our approach, in comparison with alternative defenses, for several backdoor patterns, data sets, and attack settings and demonstrate its favorability. Our defense essentially requires setting a single hyperparameter (the detection threshold), which can e.g. be chosen to fix the system's false positive rate.The first two authors contributed equally to this work.The authors are with the

show abstract

Section: Introductionmentioning

confidence: 99%

Revealing Backdoors, Post-Training, in DNN Classifiers via Novel Inference on Optimized Perturbations Inducing Group Misclassification

Xiang

Miller²,

Kesidis³

2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

show abstract

“…where I is the identity matrix and (21) follows from (x ⊤ u y i )x u = (x u x ⊤ u )y i . Lastly, computing (20) and (21) for all i ∈ Γ u yields J w v (x u ) and J w v (y i ). Note that ∇ w vr ut can be computed in exactly the same procedure.…”

Section: Solving Rating Scores For a Fake Usermentioning

confidence: 99%

Influence Function based Data Poisoning Attacks to Top-N Recommender Systems

Fang

Gong

Liu

2020

Proceedings of the Web Conference 2020

123

View full text Add to dashboard Cite

Recommender system is an essential component of web services to engage users. Popular recommender systems model user preferences and item properties using a large amount of crowdsourced user-item interaction data, e.g., rating scores; then top-N items that match the best with a user's preference are recommended to the user. In this work, we show that an attacker can launch a data poisoning attack to a recommender system to make recommendations as the attacker desires via injecting fake users with carefully crafted user-item interaction data. Specifically, an attacker can trick a recommender system to recommend a target item to as many normal users as possible. We focus on matrix factorization based recommender systems because they have been widely deployed in industry. Given the number of fake users the attacker can inject, we formulate the crafting of rating scores for the fake users as an optimization problem. However, this optimization problem is challenging to solve as it is a non-convex integer programming problem. To address the challenge, we develop several techniques to approximately solve the optimization problem. For instance, we leverage influence function to select a subset of normal users who are influential to the recommendations and solve our formulated optimization problem based on these influential users. Our results show that our attacks are effective and outperform existing methods. CCS CONCEPTS• Security and privacy → Web application security. KEYWORDSAdversarial recommender systems, data poisoning attacks, adversarial machine learning.

show abstract

“…Trojan Attacks. Neural networks, such as for facial recognition systems, can be trained in a way that they output a specific value, when the input has a certain "trojan trigger" embedded in it [45,68]. The trojan trigger can be a fixed input pattern (e.g., a sub-image) or some transformation that can be stamped on to a benign image.…”

Section: Security Applicationsmentioning

confidence: 99%

Quantitative Verification of Neural Networks and Its Security Applications

Baluta

Shen

Shinde

et al. 2019

Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security

View full text Add to dashboard Cite

Neural networks are increasingly employed in safety-critical domains. This has prompted interest in verifying or certifying logically encoded properties of neural networks. Prior work has largely focused on checking existential properties, wherein the goal is to check whether there exists any input that violates a given property of interest. However, neural network training is a stochastic process, and many questions arising in their analysis require probabilistic and quantitative reasoning, i.e., estimating how many inputs satisfy a given property. To this end, our paper proposes a novel and principled framework to quantitative verification of logical properties specified over neural networks. Our framework is the first to provide PAC-style soundness guarantees, in that its quantitative estimates are within a controllable and bounded error from the true count. We instantiate our algorithmic framework by building a prototype tool called NPAQ that enables checking rich properties over binarized neural networks. We show how emerging security analyses can utilize our framework in 3 concrete point applications: quantifying robustness to adversarial inputs, efficacy of trojan attacks, and fairness/bias of given neural networks.

show abstract

Trojaning Attack on Neural Networks

Cited by 917 publications

References 32 publications

Revealing Backdoors, Post-Training, in DNN Classifiers via Novel Inference on Optimized Perturbations Inducing Group Misclassification

Revealing Backdoors, Post-Training, in DNN Classifiers via Novel Inference on Optimized Perturbations Inducing Group Misclassification

Influence Function based Data Poisoning Attacks to Top-N Recommender Systems

Quantitative Verification of Neural Networks and Its Security Applications

Contact Info

Product

Resources

About