The intelligent imaging sensors in IoT benefit a lot from the continuous renewal of deep neural networks (DNNs). However, the appearance of adversarial examples leads to skepticism about the trustworthiness of DNNs. Malicious perturbations, even unperceivable for humans, lead to incapacitations of a DNN, bringing about the security problem in the information integration of an IoT system. Adversarial example detection is an intuitive solution to judge if an input is malicious before acceptance. However, the existing detection approaches, more or less, have some shortcomings like (1) modifying the network structure, (2) extra training before deployment, and (3) requiring some prior knowledge about attacks. To address these problems, this paper proposes a novel framework to filter out the adversarial perturbations by superimposing the original images with the noises decorated by a new gradient-independent visualization method, namely, score class activation map (Score-CAM). We propose to trim the Gaussian noises in a way with more explicit semantic meaning and stronger explainability, which is different from the previous studies based on intuitive hypotheses or artificial denoisers. Our framework requires no extra training and gradient calculation, which is friendly to embedded devices with only inference capabilities. Extensive experiments demonstrate that the proposed framework is sufficiently general to detect a wide range of attacks and apply it to different models.
Adversarial examples have raised public concern about the robustness of deep neural networks (DNNs). One universal approach to enhance the robustness is adversarial training which essentially augments the training data. However, adversarial training succeeded only to a very limited extent. This limited progress is partly due to the lack of interpretation and understanding of the robustness of DNNs. In this context, we try to explain the adversarial robustness by embedding the sample points onto a hypersphere, which naturally provides an interpretable metric for the distance between sample points. Different from the empirical intuition, it is observed that adversarial trained models show complex patterns when facing different datasets and training configurations. Observations and explanations about robustness and model behavior are made from the aspect of the distances between samples points. Lastly, we discuss the degradation of standard accuracy in the adversarial trained models and provide possible solutions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.