2019
DOI: 10.1609/aaai.v33i01.3301742
|View full text |Cite
|
Sign up to set email alerts
|

AutoZOOM: Autoencoder-Based Zeroth Order Optimization Method for Attacking Black-Box Neural Networks

Abstract: Recent studies have shown that adversarial examples in stateof-the-art image classifiers trained by deep neural networks (DNN) can be easily generated when the target model is transparent to an attacker, known as the white-box setting. However, when attacking a deployed machine learning service, one can only acquire the input-output correspondences of the target model; this is the so-called black-box attack setting. The major drawback of existing black-box attacks is the need for excessive model queries, which… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
279
0
1

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
4
1

Relationship

3
6

Authors

Journals

citations
Cited by 327 publications
(280 citation statements)
references
References 13 publications
0
279
0
1
Order By: Relevance
“…In addition to the aforementioned works, there are also other black-box attacks [20,21,22,23] under different practical settings, which are explored very recently. Among those, the notable boundary method [20] implements a decisionbased attack, which starts from a very large adversarial perturbation (thus causing an immediate misclassification) and tries to reduce the perturbation (i.e., minimize the distortion) through a random walk while remaining adversarial via staying on the boundary between the misclassified class and the true class.…”
Section: Other Black-box Attacksmentioning
confidence: 99%
“…In addition to the aforementioned works, there are also other black-box attacks [20,21,22,23] under different practical settings, which are explored very recently. Among those, the notable boundary method [20] implements a decisionbased attack, which starts from a very large adversarial perturbation (thus causing an immediate misclassification) and tries to reduce the perturbation (i.e., minimize the distortion) through a random walk while remaining adversarial via staying on the boundary between the misclassified class and the true class.…”
Section: Other Black-box Attacksmentioning
confidence: 99%
“…It should be noted that recent research on black-box attacks has largely focused on classifiers that provide confidence scores, which is an easier setting. Nevertheless, many of these methods also use random sampling [6,21,10], and the biases we propose could also benefit their approaches. As an aside, Ilyas et al [10] propose a variation of their attack that manages to apply gradient estimation to discrete labels.…”
Section: Sampling-basedmentioning
confidence: 99%
“…Finite-difference methods, FDM, which are also known as zeroorder optimization methods, directly estimate gradientsĜ dnn ′ (x) for a target API ′ by making repeated queries around x [7,21,22,41] and recording minute differences in the returned values. The baseline assumption is that API ′ returns maximum information (API ′ I ).…”
Section: Adversarial Example Definitionsmentioning
confidence: 99%