Since the BOSS competition, in 2010, most steganalysis approaches use a learning methodology involving two steps: feature extraction, such as the Rich Models (RM), for the image representation, and use of the Ensemble Classifier (EC) for the learning step. In 2015, Qian et al. have shown that the use of a deep learning approach that jointly learns and computes the features, was very promising for the steganalysis.In this paper, we follow-up the study of Qian et al., and show that in the scenario where the steganograph always uses the same embedding key for embedding with the simulator in the different images, due to intrinsic joint minimization and the preservation of spatial information, the results obtained from a Convolutional Neural Network (CNN) or a Fully Connected Neural Network (FNN), if well parameterized, surpass the conventional use of a RM with an EC.First, numerous experiments were conducted in order to find the best "shape" of the CNN. Second, experiments were carried out in the clairvoyant scenario in order to compare the CNN and FNN to an RM with an EC. The results show more than 16% reduction in the classification error with our CNN or FNN. Third, experiments were also performed in a cover-source mismatch setting. The results show that the CNN and FNN are naturally robust to the mismatch problem.In Addition to the experiments, we provide discussions on the internal mechanisms of a CNN, and weave links with some previously stated ideas, in order to understand the results we obtained. We also have a discussion on the scenario "same embedding key".
In the field of remote sensing, it is very common to use data from several sensors in order to make classification or segmentation. Most of the standard Remote Sensing analysis use machine learning methods based on image descriptions as HOG or SIFT and a classifier as SVM. In recent years neural networks have emerged as a key tool regarding the detection of objects. Due to the heterogeneity of information (optical, infrared, LiDAR), the combination of multi-source data is still an open issue in the Remote Sensing field. In this paper, we focus on managing data from multiple sources for the task of localization of urban trees in multi-source (optical, infrared, DSM) aerial images and we evaluate the different effects of preprocessing on the input data of a CNN.
ABSTRACT:Urban growth is an ongoing trend and one of its direct consequences is the development of buried utility networks. Locating these networks is becoming a challenging task. While the labeling of large objects in aerial images is extensively studied in Geosciences, the localization of small objects (smaller than a building) is in counter part less studied and very challenging due to the variance of object colors, cluttered neighborhood, non-uniform background, shadows and aspect ratios. In this paper, we put forward a method for the automatic detection and localization of manhole covers in Very High Resolution (VHR) aerial and remotely sensed images using a Convolutional Neural Network (CNN). Compared to other detection/localization methods for small objects, the proposed approach is more comprehensive as the entire image is processed without prior segmentation. The first experiments using the Prades-Le-Lez and Gigean datasets show that our method is indeed effective as more than 49% of the ground truth database is detected with a precision of 75%. New improvement possibilities are being explored such as using information on the shape of the detected objects and increasing the types of objects to be detected, thus enabling the extraction of more object specific features.
Meetings are a common activity that provide certain challenges when creating systems that assist them. Such is the case of the Speaker recognition, which can provide useful information for human interaction modeling, or human-robot interaction. Speaker recognition is mostly done using speech, however, certain visual and contextual information can provide additional insights. In this paper we propose a speaker detection framework that integrates audiovisual features with social information, from the meeting context. Visual cue is processed using a Convolutional Neural Network (CNN) that captures the spatio-temporal relationships. We analyse several CNN architectures with both cues: raw pixels (RGB images) and motion (estimated with optical flow). Contextual reasoning is done with an original methodology, based on the gaze of all participants. We evaluate our proposal with a public benchmarks in state-of-art: AMI corpus. We show how the addition of visual and context information improves the performance of the speaker recognition.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.