Abstract-We describe a novel general strategy for building steganography detectors for digital images. The process starts with assembling a rich model of the noise component as a union of many diverse submodels formed by joint distributions of neighboring samples from quantized image noise residuals obtained using linear and non-linear high-pass filters. In contrast to previous approaches, we make the model assembly a part of the training process driven by samples drawn from the corresponding cover-and stego-sources. Ensemble classifiers are used to assemble the model as well as the final steganalyzer due to their low computational complexity and ability to efficiently work with high-dimensional feature spaces and large training sets. We demonstrate the proposed framework on three steganographic algorithms designed to hide messages in images represented in the spatial domain: HUGO, edgeadaptive algorithm by Luo et al. [32], and optimallycoded ternary ±1 embedding. For each algorithm, we apply a simple submodel-selection technique to increase the detection accuracy per model dimensionality and show how the detection saturates with increasing complexity of the rich model. By observing the differences between how different submodels engage in detection, an interesting interplay between the embedding and detection is revealed. Steganalysis built around rich image models combined with ensemble classifiers is a promising direction towards automatizing steganalysis for a wide spectrum of steganographic schemes. I. IntroductionModern feature-based steganalysis starts with adopting an image model (a low-dimensional representation) within which steganalyzers are built using machine learning tools. The model is usually determined not only by the characteristics of the cover source but also by the effects of embedding [4] [33], which was seemingly proposed from a pure cover model, is in reality, too, driven by a specific case of steganography. The choice of the order of the Markov process as well as the threshold T and the
Abstract-Today, the most accurate steganalysis methods for digital media are built as supervised classifiers on feature vectors extracted from the media. The tool of choice for the machine learning seems to be the support vector machine (SVM). In this paper, we propose an alternative and wellknown machine learning tool -ensemble classifiers implemented as random forrests -and argue that they are ideally suited for steganalysis. Ensemble classifiers scale much more favorably w.r.t. the number of training examples and the feature dimensionality with performance comparable to the much more complex SVMs. The significantly lower training complexity opens up the possibility for the steganalyst to work with rich (high-dimensional) cover models and train on larger training sets -two key elements that appear necessary to reliably detect modern steganographic algorithms. Ensemble classification is portrayed here as a powerful developer tool that allows fast construction of steganography detectors with markedly improved detection accuracy across a wide range of embedding methods. The power of the proposed framework is demonstrated on three steganographic methods that hide messages in JPEG images.
In this paper, we propose a rich model of DCT coefficients in a JPEG file for the purpose of detecting steganographic embedding changes. The model is built systematically as a union of smaller submodels formed as joint distributions of DCT coefficients from their frequency and spatial neighborhoods covering a wide range of statistical dependencies. Due to its high dimensionality, we combine the rich model with ensemble classifiers and construct detectors for six modern JPEG domain steganographic schemes: nsF5, model-based steganography, YASS, and schemes that use side information at the embedder in the form of the uncompressed image: MME, BCH, and BCHopt. The resulting performance is contrasted with previously proposed feature sets of both low and high dimensionality. We also investigate the performance of individual submodels when grouped by their type as well as the effect of Cartesian calibration. The proposed rich model delivers superior performance across all tested algorithms and payloads. MOTIVATIONModern image-steganography detectors consist of two basic parts: an image model and a machine learning tool that is trained to distinguish between cover and stego images represented in the chosen model. The detection accuracy is primarily determined by the image model, which should be sensitive to steganographic embedding changes and insensitive to the image content. It is also important that it captures as many dependencies among individual image elements (DCT coefficients) as possible to increase the chance that at least some of these dependencies will be disturbed by embedding. By measuring mutual information between coefficient pairs, it has been already pointed out [9] that the strongest dependencies among DCT coefficients are between close spatial-domain (inter-block) and frequency-domain (intra-block) neighbors. This fact was intuitively utilized by numerous researchers in the past, who proposed to represent JPEG images using joint or conditional probability distributions of neighboring coefficient pairs [1,3,14,15,17,22] possibly expanded with their calibrated versions [8,11]. In [9,10], the authors pointed out that by merging many such joint distributions (co-occurrence matrices), substantial improvement in detection accuracy can be obtained if combined with machine learning that can handle high model dimensionality and large training sets.In this paper, we propose a complex (rich) model of JPEG images consisting of a large number of individual submodels. The novelty w.r.t. our previous contributions [9,10] is at least three-fold: 1) we view the absolute values of DCT coefficients in a JPEG image as 64 weakly dependent parallel channels and separate the joint statistics by individual DCT modes; 2) to increase the model diversity, we form the same model from differences between absolute values of DCT coefficients; 3) we add integral joint statistics between coefficients from a wider range of values to cover the case when steganographic embedding largely avoids disturbing the first two models. Fi...
Currently, the most successful approach to steganography in empirical objects, such as digital media, is to cast the embedding problem as source coding with a fidelity constraint. The sender specifies the costs of changing each cover element and then embeds a given payload by minimizing the total embedding cost. Since efficient practical codes exist that embed near the rate-distortion bound, the remaining task left to the steganographer is the fidelity measure -the choice of the costs. In the past, the costs were obtained either in an ad hoc manner or determined from the effects of embedding in a chosen feature space. In this paper, we adopt a different strategy in which the cover is modeled as a sequence of independent but not necessarily identically distributed quantized Gaussians and the embedding change probabilities are derived to minimize the total KL divergence within the chosen model for a given embedding operation and payload. Despite the simplicity of the adopted model, the resulting stegosystem exhibits security that is comparable to current state-of-the-art methods methods across a wide range of payloads.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.