Despite the enormous performance of deep neural networks (DNNs), recent studies have shown their vulnerability to adversarial examples (AEs), i.e., carefully perturbed inputs designed to fool the targeted DNN. Currently, the literature is rich with many effective attacks to craft such AEs. Meanwhile, many defense strategies have been developed to mitigate this vulnerability. However, these latter showed their effectiveness against specific attacks and does not generalize well to different attacks. In this paper, we propose a framework for defending DNN classifier against adversarial samples. The proposed method is based on a two-stage framework involving a separate detector and a denoising block. The detector aims to detect AEs by characterizing them through the use of natural scene statistic (NSS), where we demonstrate that these statistical features are altered by the presence of adversarial perturbations. The denoiser is based on block matching 3D (BM3D) filter fed by an optimum threshold value estimated by a convolutional neural network (CNN) to project back the samples detected as AEs into their data manifold. We conducted a complete evaluation on three standard datasets, namely MNIST, CIFAR-10 and Tiny-ImageNet. The experimental results show that the proposed defense method outperforms the state-of-the-art defense techniques by improving the robustness against a set of attacks under black-box, gray-box and white-box settings. The source code is available at: https://github.com/kherchouche-anouar/2DAE.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.