Adversarial Examples Are Not Bugs, They Are Features

Ilyas, Andrew; Santurkar, Shibani; Tsipras, Dimitris; Engstrom, Logan; Tran, Brandon; Mądry, Aleksander

doi:10.48550/arxiv.1905.02175

Cited by 142 publications

(237 citation statements)

References 17 publications

Supporting

Mentioning

225

Contrasting

Order By: Relevance

“…Also, we show that these methods consistently achieve higher standard accuracy (i.e., non adversarial accuracy), than the nominal neural networks trained without robustness. While this result is not true for a general choice of uncertainty set (see for example Ilyas et al (2019)), we observe that when the uncertainty set has the appropriate size it can significantly improve the classification performance of the network, which is consistent with the results obtained for other classification models like Support Vector Machines, Logistic Regression and Classification Trees (Bertsimas et al, 2019).…”

Section: Introductionsupporting

confidence: 87%

A Robust Optimization Approach to Deep Learning

Bertsimas¹,

Boix²,

Carballo³

et al. 2021

Preprint

View full text Add to dashboard Cite

Many state-of-the-art adversarial training methods leverage upper bounds of the adversarial loss to provide security guarantees. Yet, these methods require computations at each training step that can not be incorporated in the gradient for backpropagation. We introduce a new, more principled approach to adversarial training based on a closed form solution of an upper bound of the adversarial loss, which can be effectively trained with backpropagation. This bound is facilitated by state-ofthe-art tools from robust optimization. We derive two new methods with our approach. The first method (Approximated Robust Upper Bound or aRUB) uses the first order approximation of the network as well as basic tools from linear robust optimization to obtain an approximate upper bound of the adversarial loss that can be easily implemented. The second method (Robust Upper Bound or RUB), computes an exact upper bound of the adversarial loss. Across a variety of tabular and vision data sets we demonstrate the effectiveness of our more principled approach -RUB is substantially more robust than state-of-the-art methods for larger perturbations, while aRUB matches the performance of state-of-the-art methods for small perturbations. Also, both RUB and aRUB run faster than standard adversarial training (at the expense of an increase in memory). All the code to reproduce the results can be found at https://github.com/kimvc7/Robustness.

show abstract

Section: Introductionsupporting

confidence: 87%

A Robust Optimization Approach to Deep Learning

Bertsimas¹,

Boix²,

Carballo³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Some defenses focus on masking the computational process of the model, for example through non-differentiable layers [35]. Ilyas et al [10] claim that NNs do learn to classify correctly based on their training set, and that their vulnerability reflects higherorder features that exist in the dataset and are not accessible to humans. Therefore, their suggested defense was to train on special datasets that do not contain such features.…”

Section: B Defense Methodsmentioning

confidence: 99%

“…AEs may be robust to physical transformations [5] and therefore pose a security breach not only for cyber-world applications [6] but also for real-world systems, such as computer vision of autonomous vehicles [7]. Although various defense techniques have been suggested, none of them can ensure complete effective defense in all settings [8]- [10].…”

Section: Introductionmentioning

confidence: 99%

Evaluation of Neural Networks Defenses and Attacks using NDCG and Reciprocal Rank Metrics

Brama,

Dery,

Grinshpoun

2022

Preprint

View full text Add to dashboard Cite

The problem of attacks on neural networks through input modification (i.e., adversarial examples) has attracted much attention recently. Being relatively easy to generate and hard to detect, these attacks pose a security breach that many suggested defenses try to mitigate. However, the evaluation of the effect of attacks and defenses commonly relies on traditional classification metrics, without adequate adaptation to adversarial scenarios. Most of these metrics are accuracy-based, and therefore may have a limited scope and low distinctive power. Other metrics do not consider the unique characteristics of neural networks functionality, or measure the effect of the attacks indirectly (e.g., through the complexity of their generation). In this paper, we present two metrics which are specifically designed to measure the effect of attacks, or the recovery effect of defenses, on the output of neural networks in multiclass classification tasks. Inspired by the normalized discounted cumulative gain and the reciprocal rank metrics used in information retrieval literature, we treat the neural network predictions as ranked lists of results. Using additional information about the probability of the rank enabled us to define novel metrics that are suited to the task at hand. We evaluate our metrics using various attacks and defenses on a pretrained VGG19 model and the ImageNet dataset. Compared to the common classification metrics, our proposed metrics demonstrate superior informativeness and distinctiveness.

show abstract

“…Shortcut learning. Recently, the community has realized that deep models may rely on shortcuts to make decisions [Beery et al, 2018, Niven and Kao, 2019, Ilyas et al, 2019, Geirhos et al, 2020, Huh et al, 2021. Shortcuts are spurious features that are correlated with training labels but do not generalize on test data.…”

Section: Related Workmentioning

confidence: 99%

Availability Attacks Create Shortcuts

Yu,

Zhang,

Chen

et al. 2021

Preprint

View full text Add to dashboard Cite

Indiscriminate data poisoning attacks, which add imperceptible perturbations to training data to maximize the test error of trained models, have become a trendy topic because they are thought to be capable of preventing unauthorized use of data. In this work, we investigate why these perturbations work in principle. We find that the perturbations of advanced poisoning attacks are almost linear separable when assigned with the target labels of the corresponding samples, which hence can work as shortcuts for the learning objective. This important population property has not been unveiled before. Moreover, we further verify that linear separability is indeed the workhorse for poisoning attacks. We synthesize linear separable data as perturbations and show that such synthetic perturbations are as powerful as the deliberately crafted attacks. Our finding suggests that the shortcut learning problem is more serious than previously believed as deep learning heavily relies on shortcuts even if they are of an imperceptible scale and mixed together with the normal features. This finding also suggests that pre-trained feature extractors would disable these poisoning attacks effectively.

show abstract

Adversarial Examples Are Not Bugs, They Are Features

Cited by 142 publications

References 17 publications

A Robust Optimization Approach to Deep Learning

A Robust Optimization Approach to Deep Learning

Evaluation of Neural Networks Defenses and Attacks using NDCG and Reciprocal Rank Metrics

Availability Attacks Create Shortcuts

Contact Info

Product

Resources

About