2017
DOI: 10.48550/arxiv.1706.06083
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Towards Deep Learning Models Resistant to Adversarial Attacks

Abstract: Recent work has demonstrated that neural networks are vulnerable to adversarial examples, i.e., inputs that are almost indistinguishable from natural data and yet classified incorrectly by the network. In fact, some of the latest findings suggest that the existence of adversarial attacks may be an inherent weakness of deep learning models. To address this problem, we study the adversarial robustness of neural networks through the lens of robust optimization. This approach provides us with a broad and unifying … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

34
3,758
7
2

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 2,139 publications
(3,801 citation statements)
references
References 25 publications
(58 reference statements)
34
3,758
7
2
Order By: Relevance
“…Although truncation on its own is expected to increase a classifier's robustness, we suggest going farther and coupling our framework with adversarial training as originally proposed by [8]. In the Gaussian mixture setting considered in Section 4, we prove that the asymptotically optimal classifier requires truncation as well as an optimization step for finding the best weights that resemble adversarial training.…”
Section: Adversarial Trainingmentioning
confidence: 94%
See 3 more Smart Citations
“…Although truncation on its own is expected to increase a classifier's robustness, we suggest going farther and coupling our framework with adversarial training as originally proposed by [8]. In the Gaussian mixture setting considered in Section 4, we prove that the asymptotically optimal classifier requires truncation as well as an optimization step for finding the best weights that resemble adversarial training.…”
Section: Adversarial Trainingmentioning
confidence: 94%
“…They were initially shown to be effective in causing classification errors throughout different machine learning models [5,6,7]. Following this, a lot of effort has been put into generating increasingly more complex attack models that can utilize a small amount of semantic-preserving modifications, while still being able to fool a classifier [8,9,10]. Typically, this is done by constraining the perturbations with an p -norm, where the most common settings use either ∞ [11,12,9,8,13,14,15], 2 [16,9,17,18,19], or 1 [20,21].…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…The projected gradient descent (PGD) [16] we adopted is the most common method for finding adversarial examples. Given x 0 ∈ X ,…”
Section: Guiding Adversarial Examplesmentioning
confidence: 99%