Adversarial examples in machine learning for images are widely publicized and explored. Illustrations of misclassifications caused by slightly perturbed inputs are abundant and commonly known (e.g., a picture of panda imperceptibly perturbed to fool the classifier into incorrectly labeling it as a gibbon). Similar attacks on deep learning (DL) for radio frequency (RF) signals and their mitigation strategies are scarcely addressed in the published work. Yet, RF adversarial examples (AdExs) with minimal waveform perturbations can cause drastic, targeted misclassification results, particularly against spectrum sensing/survey applications (e.g. BPSK is mistaken for 8-PSK). Our research on deep learning AdExs and proposed defense mechanisms are RF-centric, and incorporate physicalworld, over-the-air (OTA) effects. We herein present defense mechanisms based on pre-training the target classifier using an autoencoder. Our results validate this approach as a viable mitigation method to subvert adversarial attacks against deep learning-based communications and radar sensing systems.
I. INTROA new research direction is emerging in the field of wireless communications, aiming to develop and evaluate deep learning (DL) approaches against classical detection and estimation methods in the radio frequency (RF) realm. Spectrum sensing, especially in the context of cognitive radio, encompasses most of the radio signal detection problems that are being addressed. The approach to DL in the RF domain differs greatly from the common current DL applications (e.g. image recognition, natural language processing) and requires special knowledge of RF signal processing and wireless communications and/or radar, depending on the signal utilization. While research on adversarial examples in machine learning for images has been prolific, similar attacks on deep learning of radio frequency (RF) signals and the mitigation strategies are scarcely addressed in the published work, with only a couple of recent publications on RF [1], [2]. Adversarial examples (AdExs) are slightly perturbed inputs that are classified incorrectly by the Machine Learning (ML) model [3]. This perturbation is achieved by mathematical processing of the signal, e.g., by adding an incremental value in the direction of the classifiers gradient with respect to the inputs (as in the FGSM attack illustrated in Fig. 3 A), or by solving a constrained optimization problem. Popular deep learning (DL) models are even more vulnerable to AdExs as DL networks learn input-output mappings that are fairly discontinuous. Consider the images in Figure 1 [4]. The image on the left is the original image of a panda from the ImageNet dataset [5], while the one on the right is derived from it by applying Fig. 1. Famous panda illustration of an adversarial image example against a DL classifier where a visually imperceptible, noise-like perturbation can fool the classifier to label it as gibbon an FGSM attack of very low intensity. The perturbation of 0.007 added in the direction of the loss gradie...