2022
DOI: 10.48550/arxiv.2204.12848
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Detecting Backdoor Poisoning Attacks on Deep Neural Networks by Heatmap Clustering

Abstract: Predicitions made by neural networks can be fraudulently altered by so-called poisoning attacks. A special case are backdoor poisoning attacks. We study suitable detection methods and introduce a new method called Heatmap Clustering. There, we apply a k-means clustering algorithm on heatmaps produced by the state-of-the-art explainable AI method Layer-wise relevance propagation. The goal is to separate poisoned from un-poisoned data in the dataset. We compare this method with a similar method, called Activatio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 20 publications
0
3
0
Order By: Relevance
“…We draw inspiration from defenses against poisoning, backdoor, and membership inference attacks, which are all related to isotopes (see §2), to identify techniques that could detect or disrupt isotopes. For example, A could try to detect isotopes using existing methods for spurious correlation detection [48,64] or by analyzing F to detect isotope-induced changes [10,25,51,59,69,71]. To disrupt isotopes, A could use adversarial augmentations during training [54,62], modify F 's outputs to harm V's performance [32,63], or selectively retrain F so it forgets isotope features [40].…”
Section: Robustness To Adaptive Countermeasuresmentioning
confidence: 99%
See 1 more Smart Citation
“…We draw inspiration from defenses against poisoning, backdoor, and membership inference attacks, which are all related to isotopes (see §2), to identify techniques that could detect or disrupt isotopes. For example, A could try to detect isotopes using existing methods for spurious correlation detection [48,64] or by analyzing F to detect isotope-induced changes [10,25,51,59,69,71]. To disrupt isotopes, A could use adversarial augmentations during training [54,62], modify F 's outputs to harm V's performance [32,63], or selectively retrain F so it forgets isotope features [40].…”
Section: Robustness To Adaptive Countermeasuresmentioning
confidence: 99%
“…Since marks increase the probability of the marked label 𝑦 𝑗 for marked inputs, the feature-space region associated with 𝑦 𝑗 may exhibit isotope-specific behaviors. Several defenses against backdoor attacks use feature inspection to detect backdoors [10,25,51,59,69,71].…”
Section: Inspecting Featuresmentioning
confidence: 99%
“…The model's training data is scraped from public sources and can contain a small number of poisoned images [6]. A dataset sanitation process aims to filter or augment poisoned images [8,26,67] before training the model using a robust training approach [4,31,43,80]. After training, the defender can choose to inspect or repair the model using a limited amount of trustworthy data [44,46,79] before deployment when they suspect backdooring.…”
Section: Data Poisoning Defensesmentioning
confidence: 99%