A Pascal challenge entitled monaural multi-talker speech recognition was developed, targeting the problem of robust automatic speech recognition against speech like noises which significantly degrades the performance of automatic speech recognition systems. In this challenge, two competing speakers say a simple command simultaneously and the objective is to recognize speech of the target speaker. Surprisingly during the challenge, a team from IBM research, could achieve a performance better than human listeners on this task. The proposed method of the IBM team, consist of an intermediate speech separation and then a single-talker speech recognition. This paper reconsiders the task of this challenge based on gain adapted factorial speech processing models. It develops a joint-token passing algorithm for direct utterance decoding of both target and masker speakers, simultaneously. Comparing it to the challenge winner, it uses maximum uncertainty during the decoding which cannot be used in the past two-phased method. It provides detailed derivation of inference on these models based on general inference procedures of probabilistic graphical models. As another improvement, it uses deep neural networks for joint-speaker identification and gain estimation which makes these two steps easier than before producing competitive results for these steps. The proposed method of this work outperforms past superhuman results and even the results were achieved recently by Microsoft research, using deep neural networks. It achieved 5.5% absolute task performance improvement compared to the first superhuman system and 2.7% absolute task performance improvement compared to its recent competitor.
The two key concepts of information literacy and self-efficacy are of the utmost importance in information searches, especially in new information and media environments such as the web. As a result, the sense of efficacy related to the information literacy skills of users should be regarded as a real concern. The article reports on research regarding Information Literacy Self-efficacy dimensions in a sample of post-graduate students at Shahed University, Tehran, Iran. A survey-descriptive method using a highly validated 28-item scale developed by Kurbanoglu, Akkoyunlu and Umay (
This paper investigates the effectiveness of factorial speech processing models in noise-robust automatic speech recognition tasks. For this purpose, the paper proposes an idealistic approach for modeling state-conditional observation distribution of factorial models based on weighted stereo samples. This approach is an extension to previous single pass retraining for ideal model compensation which is extended here to support multiple audio sources. Non-stationary noises can be considered as one of these audio sources with multiple states. Experiments of this paper over the set A of the Aurora 2 dataset show that recognition performance can be improved by this consideration. The improvement is significant in low signal to noise energy conditions, up to 4% absolute word recognition accuracy. In addition to the power of the proposed method in accurate representation of state-conditional observation distribution, it has an important advantage over previous methods by providing the opportunity to independently select feature spaces for both source and corrupted features. This opens a new window for seeking better feature spaces appropriate for noisy speech, independent from clean speech features.
An Intelligent Tutoring System (ITS) is a computer based instruction tool that attempts to provide individualized instructions based on learner's educational status. Advances in development of these systems have rose and fell since their emergence. Perhaps the main reason for this is the absence of appropriate framework for ITS development. This paper proposes a framework for designing two main parts of ITSs. Besides development framework, the second main reason for lack of significant advances in ITS development is its development cost. In general, this cost for instructional material is quite high and it becomes more in ITS development. The proposed method can significantly reduce the development cost. The cost reduction mainly is because of characteristics of applied mapping techniques. These maps are human readable and easily understandable by people who are not aware of knowledge representation techniques. The proposed framework is implemented for a graduate course at a technical university in Asia. This experiment provides an individualized instruction which is the main designing purpose of the ITSs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.