Hidden Markov models (HMM) are currently widely used and a successful statistical method for automatic recognition of spoken utterances. This paper describes an adaptation of HMM to automatic recognition of unrestricted handwritten words. Focussed on HMM, we describe many interesting details of a 50,000 vocabulary recognition system for US city names. This system includes feature extraction, classification, estimation of model parameters, and word recognition. The feature extraction module transforms a binary image to a sequence of feature vectors. The classification module consists of a transformation based on linear discriminant analysis and gaussian soft-decision vector quantizers which transform feature vectors into sets of symbols and associated likelihoods. Symbols and likelihoods form the input to both HMM training and recognition. HMM training performed in several succeasiire steps requires only a small amount of gestalt labeled data on the level of characters for initialization. Most of the training material must be only labeled as uppercase ascii word. Step-by-step training is necessary because characters may occur in different styles of writing which is taken into account by a sophisticated model topology. HMM recognition based on the Viterbi algorithm runs on subsets of the whole vocabulary. A subset is selected by the ZIP-code recognition module and a statistically-based estimation of the number of characters in the word to be recognized. HMM recognition uses a mued breadth-first, depth-first search technique. It should be further mentioned that OUT recognition algorithm is segmentation-free, i.e. it works directly on lexicon WO& and not on presegmented characters.
This paper deals with different speaker adaptation methods for speech recognition systems adapting automatically to new and unknown speakers in a short training phase. The adaptation techniques aim a t transformations of feature vectors, optimized with respect to some constraints. Two different adaptation strategies are appropriate. The first one is based on least mean squared error (MSE) optimization. The second method is a codebook-driven feature transformation. Both ndnptation techniques are incorporated into two different recognition systems: dynamic time warping (DTW) and Hidden Markov Modelling (HMM). The results show, that in both systems speaker-adaptive error rates are close to speaker-dependent error rates. In the best case the mean error rate of four test speakers decreases by a factor of 6 (DTWrecognizer) resp. 3 (HMM-recognizer) compared to the interspeaker error rate without adaptation. Finally a hardware realization of the speaker-adaptive HMM-recognizer will be described.
This paper presents an analysis of the Out-0f-Vocabulary (OOV) word problem and results of experiments in language modeling of OOV words. In particular, we introduce the method of iterative substitution for correcting distortions caused by OOV words in the language model. We evaluate the results on two well known spontaneous speech tasks: Verbmobil and ATIS. We show that perplexity as well as error rate reductions can be achieved using iterutive substitution. Further, we present preliminary results in combining newspaper texts with the Verbmobil (spontaneous speech) corpus. We could reduce the perplexity of the Verbmobil test set by augmenting the training corpus with newspaper texts. Preliminary recognition results show only slight improvements in the Word Accuracy and detection of OOV words.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.