For large-vocabulary continuous speech recognition, the goal of training is to model phonemes with enough precision so that from the models one could reconstruct a sequence of acoustic parameters that accurately represents the spectral characteristics of any naturally-occurring sentence, including all coarticuladon effects that arise either between phonemes in a word or across word boundaries. The aim at Dragon Systems is to collect and process enough training data to accomplish this goal for all of natural spoken English rather than for any one restricted task. The basic unit that must be trained is the "phoneme in context" (PIC), a sequence of three phonemes accompanied by a code for prepausal lengthening. At present, syllable and word boundaries are ignored in defining PICs. More than 16,000 training tokens, half isolated words and half short phrases, were phonemically labeled by a semi-. automatic procedure using hidden Markov models. To model a phoneme in a specific context, a weighted average is constructed from training data involving the desired context and acoustically similar contexts. For use in HMM continuous-speech recognition, each PIC is converted to a Markov model that is a concatenation of one to six node models. No phoneme, in all its contexts, requires more than 64 distinct nodes, and the total number of node models ("phonemic segments") required to construct all PICs is only slightly more than 2000. As a result, the entire set of PICs can be adapted to a new speaker on the basis of a couple of thousand isolated words or a few hundred sentences of connected speech. The advantage of this approach to training is that it is not task-specific. From a single training database, Dragon Systems has constructed models for use in a 30,000-word isolated-word recognizer, for connected digits, and for two different thousand-word continuous-speech tasks.
This paper describes an algorithm for performing rapid match on continuous speech that makes it possible to recognize sentences from an 842 word vocabulary on a desktop 33 megahertz 80486 computer in near real time. This algorithm relies on a combination of smoothing and linear segmentation together with the notion of word start groups. It appears that the total computation required grows more slowly than linearly with the vocabulary size, so that larger vocabularies appear feasible, with only moderately enhanced hardware. The rapid match algorithm described here is closely related to the one that is used in DragonDictate, Dragon's commercial 30,000 word discrete utterance recognizer. rapid match module to obtain a short list of plausible extensions. The key ideas that the algorithm relies on are linear segmentation, smoothing, acoustic clustering, and word start groupings. In subsequent sections we shall elaborate on these ideas and explain their role in rapid match. We shall then report on some empirical results, having to do with a particular task that Dragon has chosen to use for development purposes: the dictation of mammography reports, using a vocabulary of 842 words. Other rapid match algorithms that are quite different in character have also been described in the literature [2], [3], and [4].
We present a 1000-word continuous speech recognition (CSR) system that operates in real time on a personal computer (PC). The system, designed for large vocabulary natural language tasks, makes use of phonetic Hidden Markov models (HMM) and incorporates acoustic, phonetic, and linguistic sources of knowledge to achieve high recognition performance.We describe the various components of this system. We also present our strategy for achieving real time recognition on the PC. Using a 486based PC with a 29K-based add-on board, the recognizer has been timed at 1.1 times real time.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.