Audio-visual speech recognition (AVSR) system is thought to be one of the most promising solutions for reliable speech recognition, particularly when the audio is corrupted by noise. However, cautious selection of sensory features is crucial for attaining high recognition performance. In the machine-learning community, deep learning approaches have recently attracted increasing attention because deep neural networks can effectively extract robust latent features that enable various recognition algorithms to demonstrate revolutionary generalization capabilities under diverse application conditions. This study introduces a connectionist-hidden Markov model (HMM) system for noise-robust AVSR. First, a deep denoising autoencoder is utilized for acquiring noise-robust audio features. By preparing the training data for the network with pairs of consecutive multiple steps of deteriorated audio features and the corresponding clean features, the network is trained to output denoised audio features from the corresponding features deteriorated by noise. Second, a convolutional neural network (CNN) is utilized to extract visual features from raw mouth area images. By preparing the training data for the CNN as pairs of raw images and the corresponding phoneme label outputs, the network is trained to predict phoneme labels from the corresponding mouth area input images. Finally, a multi-stream HMM (MSHMM) is applied for integrating the acquired audio and visual HMMs independently trained with the respective features. By comparing the cases when normal and denoised mel-frequency cepstral coefficients (MFCCs) are utilized as audio features to the HMM, our unimodal isolated word recognition results demonstrate that approximately 65 % word recognition rate gain is attained with denoised MFCCs under 10 dB signalto-noise-ratio (SNR) for the audio signal input. Moreover, our multimodal isolated word recognition results utilizing MSHMM with denoised MFCCs and acquired visual features demonstrate that an additional word recognition rate gain is attained for the SNR conditions below 10 dB.
The present study clearly demonstrates that CYP11B2 immunostaining is a powerful tool for histopathological diagnosis of aldosterone overproduction in PA and for subtype classification of APA, multiple APCCs, unilateral multiple adrenocortical micronodules, and diffuse hyperplasia.
This letter reports synchronization phenomena and mathematical modeling on a frustrated system of living beings, or Japanese tree frogs (Hyla japonica). While an isolated male Japanese tree frog calls nearly periodically, he can hear sounds including calls of other males. Therefore, the spontaneous calling behavior of interacting males can be understood as a system of coupled oscillators. We construct a simple but biologically reasonable model based on the experimental results of two frogs, extend the model to a system of three frogs, and theoretically predict the occurrence of rich synchronization phenomena, such as triphase synchronization and 1:2 antiphase synchronization. In addition, we experimentally verify the theoretical prediction by ethological experiments on the calling behavior of three frogs and time series analysis on recorded sound data. Note that the calling behavior of three male Japanese tree frogs is frustrated because almost perfect antiphase synchronization is robustly observed in a system of two male frogs. Thus, nonlinear dynamics of the three-frogs system should be far from trivial.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations鈥揷itations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright 漏 2024 scite LLC. All rights reserved.
Made with 馃挋 for researchers
Part of the Research Solutions Family.