Query-Efficient Hard-label Black-box Attack:An Optimization-based Approach

Cheng, Minhao; Le, Thong; Chen, Pin-Yu; Yi, Jinfeng; Zhang, Huan; Hsieh, Cho‐Jui

doi:10.48550/arxiv.1807.04457

Cited by 77 publications

(125 citation statements)

References 14 publications

Supporting

Mentioning

123

Contrasting

Unclassified

Order By: Relevance

“…On each of the dataset Cifar-10, Cifar-100, Tiered T 84 , and Tiered V 56 , we train the seven models ResNet-18, -34, SeResNet-26, VGG-16, MobileNet-V1, MobileNet-V3, and DenseNet-26. The network architectures of all the seven models are defined in the public GitHub repository 5 . We use consistent hyper-parameters to train all the models for 80,000 iterations without data augmentation.…”

Section: A Appendixmentioning

confidence: 99%

“…In lots of cases, the attack success rate of ICE is more than twice as much as that of baselines, which further indicates the effectiveness of the proposed ICE. 5 https://github.com/yxlijun/cifar-tensorflow…”

Section: A Appendixmentioning

confidence: 99%

“…As a result, the security and robustnes of DNNs have attracted growing attention from both academia and industry [51,9,48,20]. Existing methods for generating adversarial examples, also known as "attacks", can be categorized by the following different threat models: white-box [18,40], query-based black-box [2,28,5], and query-free black-box attack [37,43]. As Table 1 shows, in the white-box attack setting, the attacker can access all information of the victim model while in query-based or query-free black-box setting, the victim model is hidden from the attacker.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Adversarial Attack across Datasets

Qin

Xiong

Yi³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

It has been observed that Deep Neural Networks (DNNs) are vulnerable to transfer attacks in the query-free black-box setting. However, all the previous studies on transfer attack assume that the white-box surrogate models possessed by the attacker and the black-box victim models are trained on the same dataset, which means the attacker implicitly knows the label set and the input size of the victim model. However, this assumption is usually unrealistic as the attacker may not know the dataset used by the victim model, and further, the attacker needs to attack any randomly encountered images that may not come from the same dataset. Therefore, in this paper we define a new Generalized Transferable Attack (GTA) problem where we assume the attacker has a set of surrogate models trained on differente datasets (with different label sets and image sizes), and none of them is equal to the dataset used by the victim model. We then propose a novel method called Image Classification Eraser (ICE) to erase classification information for any encountered images from arbitrary dataset. Extensive experiments on Cifar-10, Cifar-100, and TieredImageNet demonstrate the effectiveness of the proposed ICE on the GTA problem. Furthermore, we show that existing transfer attack methods can be modified to tackle the GTA problem, but with significantly worse performance compared with ICE.

show abstract

Section: A Appendixmentioning

confidence: 99%

Section: A Appendixmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Adversarial Attack across Datasets

Qin

Xiong

Yi³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Ilyas et al [8] picked a target image and fine-tuned it toward the original image. Cheng et al [9,10] applied randomized gradient-free ZOO techniques.…”

Section: Related Workmentioning

confidence: 99%

“…Depending on the knowledge about the DNNs that the attackers have, adversarial attacks can be classified into white-box attacks [1,[3][4][5] and black-box attacks [6][7][8][9][10][11][12][13]. The former assumes that the attackers have complete knowledge of the deep network, while the latter assumes that the attackers have limited knowledge, typically some output information of the DNNs.…”

Section: Introductionmentioning

confidence: 99%

Mitigating Black-Box Adversarial Attacks via Output Noise Perturbation

Aithal¹,

Li²

2021

Preprint

View full text Add to dashboard Cite

In black-box adversarial attacks, adversaries query the deep neural network (DNN), use the output to reconstruct gradients, and then optimize the adversarial inputs iteratively. In this paper, we study the method of adding white noise to the DNN output to mitigate such attacks, with a unique focus on the trade-off analysis of noise level and query cost. The attacker's query count (QC) is derived mathematically as a function of noise standard deviation. With this result, the defender can conveniently find the noise level needed to mitigate attacks for the desired security level specified by QC and limited DNN performance loss. Our analysis shows that the added noise is drastically magnified by the small variation of DNN outputs, which makes the reconstructed gradient have an extremely low signal-to-noise ratio (SNR). Adding slight white noise with a standard deviation less than 0.01 is enough to increase QC by many orders of magnitude without introducing any noticeable classification accuracy reduction. Our experiments demonstrate that this method can effectively mitigate both soft-label and hard-label black-box attacks under realistic QC constraints. We also show that this method outperforms many other defense methods and is robust to the attacker's countermeasures.Preprint. Under review.

show abstract

Is Robustness the Cost of Accuracy? – A Comprehensive Study on the Robustness of 18 Deep Image Classification Models

Su¹,

Zhang

Chen

et al. 2018

Lecture Notes in Computer Science

Self Cite

280

213

View full text Add to dashboard Cite

The prediction accuracy has been the long-lasting and sole standard for comparing the performance of different image classification models, including the ImageNet competition. However, recent studies have highlighted the lack of robustness in well-trained deep neural networks to adversarial examples. Visually imperceptible perturbations to natural images can easily be crafted and mislead the image classifiers towards misclassification. To demystify the trade-offs between robustness and accuracy, in this paper we thoroughly benchmark 18 ImageNet models using multiple robustness metrics, including the distortion, success rate and transferability of adversarial examples between 306 pairs of models. Our extensive experimental results reveal several new insights:(1) linear scaling law -the empirical 2 and ∞ distortion metrics scale linearly with the logarithm of classification error; (2) model architecture is a more critical factor to robustness than model size, and the disclosed accuracy-robustness Pareto frontier can be used as an evaluation criterion for ImageNet model designers; (3) for a similar network architecture, increasing network depth slightly improves robustness in ∞ distortion; (4) there exist models (in VGG family) that exhibit high adversarial transferability, while most adversarial examples crafted from one model can only be transferred within the same family. Experiment code is publicly available at https://github.com/huanzhang12/Adversarial Survey.where f (x, t) is a loss function to measure the distance between the prediction of x and the target label t. In this work, we choose f (x, t) = max{max i =t [(Logit(x)) i − (Logit(x)) t ], −κ}

show abstract

Query-Efficient Hard-label Black-box Attack:An Optimization-based Approach

Cited by 77 publications

References 14 publications

Adversarial Attack across Datasets

Adversarial Attack across Datasets

Mitigating Black-Box Adversarial Attacks via Output Noise Perturbation

Is Robustness the Cost of Accuracy? – A Comprehensive Study on the Robustness of 18 Deep Image Classification Models

Contact Info

Product

Resources

About