We present a maximum-likelihood (ML) stochastic matching approach to decrease the acoustic mismatch between a test utterance and a given set of speech models so as to reduce the recognition performance degradation caused by distortions in the test utterance and/or the model set. We assume that the speech signal is modeled by a set of subword hidden Markov models (HMM) X. The mismatch between the observed test utterance Y and the models X can be reduced in two ways: 1) by an inverse distortion function F (:) that maps Y into an utterance X which matches better with the models X , and 2) by a model transformation function G (:) that maps X to the transformed model Y which matches better with the utterance Y. We assume the functional form of the transformations F (:) or G (:) and estimate the parameters or in a maximumlikelihood manner using the expectation-maximization (EM) algorithm. The choice of the form of F (:) or G (:) is based on our prior knowledge of the nature of the acoustic mismatch. The stochastic matching algorithm operates only on the given test utterance and the given set of speech models, and no additional training data is required for the estimation of the mismatch prior to actual testing. Experimental results are presented to study the properties of the proposed algorithm and to verify the e cacy of the approach in improving the performance of an HMMbased continuous speech recognition system in the presence of mismatch due to di erent transducers and transmission channels. The proposed stochastic matching algorithm is found to converge fast. Further, the recognition performance in mismatched conditions is greatly improved while the performance in matched conditions is well maintained. The stochastic matching algorithm was able to reduce the word error rate by about 70% in mismatched conditions.
In this paper we document our experiences with developing speech recognition for medical transcription -a system that automatically transcribes doctor-patient conversations. Towards this goal, we built a system along two different methodological lines -a Connectionist Temporal Classification (CTC) phoneme based model and a Listen Attend and Spell (LAS) grapheme based model. To train these models we used a corpus of anonymized conversations representing approximately 14,000 hours of speech. Because of noisy transcripts and alignments in the corpus, a significant amount of effort was invested in data cleaning issues. We describe a two-stage strategy we followed for segmenting the data. The data cleanup and development of a matched language model was essential to the success of the CTC based models. The LAS based models, however were found to be resilient to alignment and transcript noise and did not require the use of language models. CTC models were able to achieve a word error rate of 20.1%, and the LAS models were able to achieve 18.3%. Our analysis shows that both models perform well on important medical utterances and therefore can be practical for transcribing medical conversations.
We introduce a model that approximates full and hlockdiagonal covariances in a Gaussian mixture, while reducing significantly both the number of parameters to estimate and the computations required to evaluate the Gaussian likelihoods. The inverse covariance of each Gaussian is expressed as a mixture of a small set of prototype matrices. Estimation of both the mixture weights and the prototypes is performed using maximum likelihood estimation. Experiments on a variety of speech recognition tasks show that this model significantly outperforms a diagonal covariance model, while using the same number of Gaussian-dependent parameters.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.