2020
DOI: 10.48550/arxiv.2006.10013
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Adversarial Examples Detection and Analysis with Layer-wise Autoencoders

Abstract: We present a mechanism for detecting adversarial examples based on data representations taken from the hidden layers of the target network. For this purpose, we train individual autoencoders at intermediate layers of the target network. This allows us to describe the manifold of true data and, in consequence, decide whether a given example has the same characteristics as true data. It also gives us insight into the behavior of adversarial examples and their flow through the layers of a deep neural network. Exp… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
17
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(17 citation statements)
references
References 21 publications
0
17
0
Order By: Relevance
“…These detectors are trained on adversarial samples generated by FGSM attack. Unsupervised detectors such as Odds-testing (Roth, Kilcher, and Hofmann 2019) and AE-layers (Wójcik et al 2020) are also considered. The details about these detectors are given in Appendix C.3.…”
Section: Detection Of Adversarial Samplesmentioning
confidence: 99%
See 2 more Smart Citations
“…These detectors are trained on adversarial samples generated by FGSM attack. Unsupervised detectors such as Odds-testing (Roth, Kilcher, and Hofmann 2019) and AE-layers (Wójcik et al 2020) are also considered. The details about these detectors are given in Appendix C.3.…”
Section: Detection Of Adversarial Samplesmentioning
confidence: 99%
“…Specifically, they measure the change in pair-wise class logits caused by adding noise to the input to detect adversaries. AE (Wójcik et al 2020) makes use of autoencoders trained on the feature space from every layer of the classifier to detect adversarial inputs. Specifically, they use reconstruction error and latent norm from the trained autoencoders as scores for adversarial detection.…”
Section: C3 Adversarialmentioning
confidence: 99%
See 1 more Smart Citation
“…For the case of supervised detectors, their detecting capability depends on how to capture the differences between adversarial and benign examples. Techniques range from studying statistical properties [5][6][7], training traditional machine learning classifiers [21,22] and deep classifiers [8,11,16,23]. It is widely accepted that supervised methods cannot generalize well to adversarial examples produced by unseen attacks.…”
Section: Detecting Adversarial Examplesmentioning
confidence: 99%
“…When this happens, all REs are mixed and it leads to high false negative or false positive rate (FNR/FPR) during detection. To refine the volume of the manifold drawn by AE and reduce its unnecessary generalization ability on adversarial examples, there exist a number of variants [15][16][17][18], as will be reviewed in detail in Sec. 2.2.…”
Section: Introductionmentioning
confidence: 99%