Historically, steganographic schemes were designed in a way to preserve image statistics or steganalytic features. Since most of the state-of-the-art steganalytic methods employ a machine learning (ML) based classifier, it is reasonable to consider countering steganalysis by trying to fool the ML classifiers. However, simply applying perturbations on stego images as adversarial examples may lead to the failure of data extraction and introduce unexpected artefacts detectable by other classifiers. In this paper, we present a steganographic scheme with a novel operation called adversarial embedding, which achieves the goal of hiding a stego message while at the same time fooling a convolutional neural network (CNN) based steganalyzer. The proposed method works under the conventional framework of distortion minimization. Adversarial embedding is achieved by adjusting the costs of image element modifications according to the gradients backpropagated from the CNN classifier targeted by the attack. Therefore, modification direction has a higher probability to be the same as the sign of the gradient. In this way, the so called adversarial stego images are generated. Experiments demonstrate that the proposed steganographic scheme is secure against the targeted adversary-unaware steganalyzer. In addition, it deteriorates the performance of other adversary-aware steganalyzers opening the way to a new class of modern steganographic schemes capable to overcome powerful CNN-based steganalysis.
Index TermsSteganography, steganalysis, adversarial machine learning.
I. INTRODUCTIONImage steganography is the art and science of concealing covert information within images. It is usually achieved by modifying image elements, such as pixels or DCT coefficients. On the other side of the game, steganalysis aims to reveal the presence of secret information by detecting whether there are abnormal artefacts left by data embedding.The developing history of steganography and steganalysis is rich of interesting stories, as they compete with each other and they benefit and evolve from the competition [1]. The earliest steganographic method was implemented by substituting the least significant bits of image elements with message bits. The stego artefacts introduced by this method can be effectively detected by Chi-squared attack [2], or steganalytic features based on first-order statistics [3]. In this initial phase of the competition, statistical hypothesis testing or a simple linear classifier such as FLD (Fisher Linear Discriminant) could serve the need of steganalysis.The first-order statistics can be restored after data embedding, as done in [4]. The abnormal artefacts in the first-order statistics can also be avoided as in [5], [6]. As a consequence, more powerful steganalytic features based on the second-order statistics [7], [8] were proposed. In this period, advanced machine learning (ML) tools, such as SVM (Support Vector Machine), were operated on high-dimensional features (where the dimension is typically several hundreds). These met...