Adversarial Examples Detection and Analysis with Layer-wise Autoencoders

Wójcik, Bartosz; Morawiecki, Pawel; Śmieja, Marek; Tomasz, Krzyżek,; Spurek, Przemysław; Tabor, Jacek

doi:10.48550/arxiv.2006.10013

“…These detectors are trained on adversarial samples generated by FGSM attack. Unsupervised detectors such as Odds-testing (Roth, Kilcher, and Hofmann 2019) and AE-layers (Wójcik et al 2020) are also considered. The details about these detectors are given in Appendix C.3.…”

Section: Detection Of Adversarial Samplesmentioning

confidence: 99%

“…Specifically, they measure the change in pair-wise class logits caused by adding noise to the input to detect adversaries. AE (Wójcik et al 2020) makes use of autoencoders trained on the feature space from every layer of the classifier to detect adversarial inputs. Specifically, they use reconstruction error and latent norm from the trained autoencoders as scores for adversarial detection.…”

Section: C3 Adversarialmentioning

confidence: 99%

“…Although it could not achieve SOTA results on FGSM and BIM attacks in comparison to the supervised detectors (Mahala and LID saw adversarial data generated by FGSM during its training and BIM is an iterative version of FGSM), it performs consistently well on these two attacks against both architectures. The results for supervised and unsupervised detectors are taken fromLee et al (2018) andWójcik et al (2020), respectively.…”

mentioning

confidence: 99%

See 1 more Smart Citation

iDECODe: In-distribution Equivariance for Conformal Out-of-distribution Detection

Kaur¹,

Jha²,

Roy³

et al. 2022

Preprint

View full text Add to dashboard Cite

Machine learning methods such as deep neural networks (DNNs), despite their success across different domains, are known to often generate incorrect predictions with high confidence on inputs outside their training distribution. The deployment of DNNs in safety-critical domains requires detection of out-of-distribution (OOD) data so that DNNs can abstain from making predictions on those. A number of methods have been recently developed for OOD detection, but there is still room for improvement. We propose the new method iDECODe, leveraging in-distribution equivariance for conformal OOD detection. It relies on a novel base non-conformity measure and a new aggregation method, used in the inductive conformal anomaly detection framework, thereby guaranteeing a bounded false detection rate. We demonstrate the efficacy of iDECODe by experiments on image and audio datasets, obtaining state-of-the-art results. We also show that iDECODe can detect adversarial examples.

show abstract

“…For the case of supervised detectors, their detecting capability depends on how to capture the differences between adversarial and benign examples. Techniques range from studying statistical properties [5][6][7], training traditional machine learning classifiers [21,22] and deep classifiers [8,11,16,23]. It is widely accepted that supervised methods cannot generalize well to adversarial examples produced by unseen attacks.…”

Section: Detecting Adversarial Examplesmentioning

confidence: 99%

“…When this happens, all REs are mixed and it leads to high false negative or false positive rate (FNR/FPR) during detection. To refine the volume of the manifold drawn by AE and reduce its unnecessary generalization ability on adversarial examples, there exist a number of variants [15][16][17][18], as will be reviewed in detail in Sec. 2.2.…”

Section: Introductionmentioning

confidence: 99%

Self-Supervised Adversarial Example Detection by Disentangled Representation

Zhang¹,

Zhang²,

Zheng³

et al. 2021

Preprint

View full text Add to dashboard Cite

Deep learning models are known to be vulnerable to adversarial examples that are elaborately designed for malicious purposes and are imperceptible to the human perceptual system. Autoencoder, when trained solely over benign examples, has been widely used for (self-supervised) adversarial detection based on the assumption that adversarial examples yield larger reconstruction error. However, because lacking adversarial examples in its training and the too strong generalization ability of autoencoder, this assumption does not always hold true in practice. To alleviate this problem, we explore to detect adversarial examples by disentangled representations of images under the autoencoder structure. By disentangling input images as class features and semantic features, we train an autoencoder, assisted by a discriminator network, over both correctly paired class/semantic features and incorrectly paired class/semantic features to reconstruct benign and counterexamples. This mimics the behavior of adversarial examples and can reduce the unnecessary generalization ability of autoencoder. We compare our method with the state-ofthe-art self-supervised detection methods under different adversarial attacks and different victim models (30 attack settings), and it exhibits better performance in various measurements (AUC, FPR, TPR) for most attack settings. Ideally, AUC is Preprint. Under review.

show abstract

iDECODe: In-Distribution Equivariance for Conformal Out-of-Distribution Detection

Kaur

¹

,

Jha

²

,

Roy

³

et al. 2022

AAAI

View full text Add to dashboard Cite

Machine learning methods such as deep neural networks (DNNs), despite their success across different domains, are known to often generate incorrect predictions with high confidence on inputs outside their training distribution. The deployment of DNNs in safety-critical domains requires detection of out-of-distribution (OOD) data so that DNNs can abstain from making predictions on those. A number of methods have been recently developed for OOD detection, but there is still room for improvement. We propose the new method iDECODe, leveraging in-distribution equivariance for conformal OOD detection. It relies on a novel base non-conformity measure and a new aggregation method, used in the inductive conformal anomaly detection framework, thereby guaranteeing a bounded false detection rate. We demonstrate the efficacy of iDECODe by experiments on image and audio datasets, obtaining state-of-the-art results. We also show that iDECODe can detect adversarial examples. Code, pre-trained models, and data are available at https://github.com/ramneetk/iDECODe.

show abstract

Adversarial Examples Detection and Analysis with Layer-wise Autoencoders

Cited by 5 publications

References 21 publications

iDECODe: In-distribution Equivariance for Conformal Out-of-distribution Detection

iDECODe: In-distribution Equivariance for Conformal Out-of-distribution Detection

Self-Supervised Adversarial Example Detection by Disentangled Representation

iDECODe: In-Distribution Equivariance for Conformal Out-of-Distribution Detection

Contact Info

Product

Resources

About