Detecting Backdoor Poisoning Attacks on Deep Neural Networks by Heatmap Clustering

Schulth, Lukas; Berghoff, Christian; Neu, Matthias

doi:10.48550/arxiv.2204.12848

Cited by 2 publications

(3 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We draw inspiration from defenses against poisoning, backdoor, and membership inference attacks, which are all related to isotopes (see §2), to identify techniques that could detect or disrupt isotopes. For example, A could try to detect isotopes using existing methods for spurious correlation detection [48,64] or by analyzing F to detect isotope-induced changes [10,25,51,59,69,71]. To disrupt isotopes, A could use adversarial augmentations during training [54,62], modify F 's outputs to harm V's performance [32,63], or selectively retrain F so it forgets isotope features [40].…”

Section: Robustness To Adaptive Countermeasuresmentioning

confidence: 99%

See 1 more Smart Citation

Data Isotopes for Data Provenance in DNNs

Wenger,

Li,

Zhao

et al. 2024

PoPETs

View full text Add to dashboard Cite

Today, creators of data-hungry deep neural networks (DNNs) scour the Internet for training fodder, leaving users with little control over or knowledge of when their data, and in particular their images, are used to train models. To empower users to counteract unwanted use of their images, we design, implement and evaluate a practical system that enables users to detect if their data was used to train a DNN model for image classification. We show how users can create special images we call isotopes, which introduce ``spurious features'' into DNNs during training. With only query access to a model and no knowledge of the model-training process, nor control of the data labels, a user can apply statistical hypothesis testing to detect if the model learned these spurious features by training on the user's images. Isotopes can be viewed as an application of a particular type of data poisoning. In contrast to backdoors and other poisoning attacks, our purpose is not to cause misclassification but rather to create tell-tale changes in confidence scores output by the model that reveal the presence of isotopes in the training data. Isotopes thus turn DNNs' vulnerability to memorization and spurious correlations into a tool for data provenance. Our results confirm efficacy in multiple image classification settings, detecting and distinguishing between hundreds of isotopes with high accuracy. We further show that our system works on public ML-as-a-service platforms and larger models such as ImageNet, can use physical objects in images instead of digital marks, and remains robust against several adaptive countermeasures.

show abstract

Section: Robustness To Adaptive Countermeasuresmentioning

confidence: 99%

“…Since marks increase the probability of the marked label 𝑦 𝑗 for marked inputs, the feature-space region associated with 𝑦 𝑗 may exhibit isotope-specific behaviors. Several defenses against backdoor attacks use feature inspection to detect backdoors [10,25,51,59,69,71].…”

Section: Inspecting Featuresmentioning

confidence: 99%

Data Isotopes for Data Provenance in DNNs

Wenger,

Li,

Zhao

et al. 2024

PoPETs

View full text Add to dashboard Cite

show abstract

“…The model's training data is scraped from public sources and can contain a small number of poisoned images [6]. A dataset sanitation process aims to filter or augment poisoned images [8,26,67] before training the model using a robust training approach [4,31,43,80]. After training, the defender can choose to inspect or repair the model using a limited amount of trustworthy data [44,46,79] before deployment when they suspect backdooring.…”

Section: Data Poisoning Defensesmentioning

confidence: 99%

SoK: How Robust is Image Classification Deep Neural Network Watermarking?

Lukas

Jiang

et al. 2022

2022 IEEE Symposium on Security and Privacy (SP)

View full text Add to dashboard Cite

Deep image classification models trained on large amounts of webscraped data are vulnerable to data poisoning, a mechanism for backdooring models. Even a few poisoned samples seen during training can entirely undermine the model's integrity during inference. While it is known that poisoning more samples enhances an attack's effectiveness and robustness, it is unknown whether poisoning too many samples weakens an attack by making it more detectable. We observe a fundamental detectability/robustness tradeoff in data poisoning attacks: Poisoning too few samples renders an attack ineffective and not robust, but poisoning too many samples makes it detectable. This raises the bar for data poisoning attackers who have to balance this trade-off to remain robust and undetectable. Our work proposes two defenses designed to (i) detect and (ii) repair poisoned models as a post-processing step after training using a limited amount of trusted image-label pairs. We show that our defenses mitigate all surveyed attacks and outperform existing defenses using less trusted data to repair a model. Our defense scales to joint vision-language models, such as CLIP, and interestingly, we find that attacks on larger models are more easily detectable but also more robust than those on smaller models. Lastly, we propose two adaptive attacks demonstrating that while our work raises the bar for data poisoning attacks, it cannot mitigate all forms of backdooring. CCS CONCEPTS• Security and privacy → Software and application security; • Computing methodologies → Machine learning.

show abstract

Detecting Backdoor Poisoning Attacks on Deep Neural Networks by Heatmap Clustering

Cited by 2 publications

References 20 publications

Data Isotopes for Data Provenance in DNNs

Data Isotopes for Data Provenance in DNNs

SoK: How Robust is Image Classification Deep Neural Network Watermarking?

Contact Info

Product

Resources

About