Backdoor Attacks Against Transfer Learning With Pre-Trained Deep Learning Models

Wang, Shuo; Nepal, Surya; Rudolph, Carsten; Grobler, Marthie; Chen, Shangyu; Chen, Tianle

doi:10.1109/tsc.2020.3000900

Cited by 69 publications

(34 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our code can be accessed at https://github.com/goel96vibhor/AdvWeightPerturbations through data poisoning attacks. Wang et al[18] proposes a backdoor injection scheme to defeating pruning-based, retraining-based and input pre-processing-based defenses. In parallel work, Kurita et al[14] expose the risk of the pre-trained BERT[4] model to backdoor injection attacks mimicking a model capture scenario.…”

mentioning

confidence: 99%

Can Adversarial Weight Perturbations Inject Neural Backdoors

Garg

Kumar

Goel

et al. 2020

Proceedings of the 29th ACM International Conference on Information &Amp; Knowledge Management

View full text Add to dashboard Cite

Adversarial machine learning has exposed several security hazards of neural models. Thus far, the concept of an "adversarial perturbation" has exclusively been used with reference to the input space referring to a small, imperceptible change which can cause a ML model to err. In this work we extend the idea of "adversarial perturbations" to the space of model weights, specifically to inject backdoors in trained DNNs, which exposes a security risk of publicly available trained models. Here, injecting a backdoor refers to obtaining a desired outcome from the model when a trigger pattern is added to the input, while retaining the original predictions on a non-triggered input. From the perspective of an adversary, we characterize these adversarial perturbations to be constrained within an ℓ ∞ norm around the original model weights. We introduce adversarial perturbations in model weights using a composite loss on the predictions of the original model and the desired trigger through projected gradient descent. Our results show that backdoors can be successfully injected with a very small average relative change in model weight values for several CV and NLP applications. CCS CONCEPTS • Computing methodologies → Neural networks.

show abstract

mentioning

confidence: 99%

Can Adversarial Weight Perturbations Inject Neural Backdoors

Garg

Kumar

Goel

et al. 2020

Proceedings of the 29th ACM International Conference on Information &Amp; Knowledge Management

View full text Add to dashboard Cite

show abstract

“…However, using pre-trained models from foreign sources can pose a risk as the models can be subject to biases and adversarial attacks, as introduced above. For example, pre-trained models may not properly reflect certain environmental constraints or contain backdoors by inserting classification triggers, for example, to misclassify medical images (Wang et al 2020). Governmental interventions to redirect or suppress predictions are conceivable as well.…”

Section: Resource Limitations and Transfer Learningmentioning

confidence: 99%

Machine learning and deep learning

2021

View full text Add to dashboard Cite

Today, intelligent systems that offer artificial intelligence capabilities often rely on machine learning. Machine learning describes the capacity of systems to learn from problem-specific training data to automate the process of analytical model building and solve associated tasks. Deep learning is a machine learning concept based on artificial neural networks. For many applications, deep learning models outperform shallow machine learning models and traditional data analysis approaches. In this article, we summarize the fundamentals of machine learning and deep learning to generate a broader understanding of the methodical underpinning of current intelligent systems. In particular, we provide a conceptual distinction between relevant terms and concepts, explain the process of automated analytical model building through machine learning and deep learning, and discuss the challenges that arise when implementing such intelligent systems in the field of electronic markets and networked business. These naturally go beyond technological aspects and highlight issues in human-machine interaction and artificial intelligence servitization.

show abstract

“…Different from adversarial attacks which usually act during the inference process of a neural model [17,38,49,53,63,63,66,74,84,85], backdoor attacks hack the model during training [10,22,40,51,61,62,75,82]. Defending against such attacks is challenging [8,23,37,41,57,73] because users have no idea of what kinds of poison has been injected into model training.…”

Section: Backdoor Attack and Defensementioning

confidence: 99%