An IBM PC based large-vocabulary isolated-utterance speech recognizer

Averbuch, Amir; Bahl, Lalit R.; Bakis, Raimo; Brown, Peter F.; Cole, A.; Daggett, G.; Das, Subrata; Davies, K.; DeGennaro, Steven; Souza, Peter de; Epstein, Edward A.; Fraleigh, D.; Jelinek, F.; Katz, Seth R.; Lewis, B.; Mercer, R.L.; Nádas, Arthur; Nahamoo, D.; Picheny, Michael; Shichman, G.; Spinelli, percussion Donald

doi:10.1109/icassp.1986.1169169

Cited by 41 publications

(5 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…On the other hand, as our experiments show, setting dl = 0, which is equivalent to discarding all "singletons," does not affect the model performance, and thus provides substantial saving in space needed for the language model. We took advantage of it in constructing a compact language model for the PCbased Speech Recognizer [5].…”

Section: -N1mentioning

confidence: 99%

Estimation of probabilities from sparse data for the language model component of a speech recognizer

Katz

1987

IEEE Trans. Acoust., Speech, Signal Process.

Self Cite

1,154

625

View full text Add to dashboard Cite

Abstract-The description of a novel type of rn-gram language model is given. The model offers, via a nonlinear recursive procedure, a computation and space efficient solution to the problem of estimating probabilities from sparse data. This solution compares favorably to other proposed methods. While the method has been developed for and successfully implemented in the IBM Real Time Speech Recognizers, its generality makes it applicable in other areas where the problem of estimating probabilities from sparse data arises.Sparseness of data is an inherent property of any real text, and it is a problem that one always encounters while collecting frequency statistics on words and word sequences (m-grams) from a text of finite size. This means that even for a very large data collection, the maximum likelihood estimation method does not allow us to adequately estimate probabilities of rare but nevertheless possible word sequences-many sequences occur only once ("singletons"); many more do not occur at all. Inadequacy of the maximum likelihood estimator and the necessity to estimate the probabilities of m-grams which did not occur in the text constitute the essence of the problem.The main idea of the proposed solution to the problem is to reduce unreliable probability estimates given by the observed frequencies and redistribute the "freed" probability "mass" among m-grams which never occurred in the text. The reduction is achieved by replacing maximum likelihood estimates for m-grams having low counts with renormalized Turing's estimates [l], and the redistribution is done via the recursive utilization of lower level conditional distributions. We found Turing's method attractive because of its simplicity and its characterization as the optimal empirical Bayes' estimator of a multinomial probability. Robbins in [2] introduces the empirical Bayes' methodology and Nadas in [3] gives various derivations of the Turing's formula.Let N be a sample text size and let n, be the number of words (m-grams) which occurred in the text exactly r times, so that(1) Turing's estimate PT for a probability of a word (m-gram) which occurred in the sample r times is r r* PT = where We call a procedure of replacing a count r with a modified count r' "discounting" and a ratio r t / r a discount coefficient dr. When r' = r * , we have Turing's discounting.Let us denote the m-gram w l , * . . , w, as wy and the number of times it occurred in the sample text as c ( w T ) . Then the maximum likelihood estimate is

show abstract

Section: -N1mentioning

confidence: 99%

Estimation of probabilities from sparse data for the language model component of a speech recognizer

Katz

1987

IEEE Trans. Acoust., Speech, Signal Process.

Self Cite

1,154

625

View full text Add to dashboard Cite

show abstract

“…The ACL/DCI contains several large text databases [9]. The best one for our purposes is probably the transcriptions of A cheaper, but less useful alternative is the continuous speech version of the IBM 5000 word vocabulary office correspondence (OC-5000) database [1]. This database has a good bigram language model, has sentence lists (but the test list is only 50 sentences), and has acoustic data which has already been recorded.…”

Section: A Proposal For a Rich And Realistic Darpa Csr Databasementioning

confidence: 99%

On the interaction between true source, training, and testing language models

Paul¹,

Baker²

1990

Proceedings of the Workshop on Speech and Natural Language - HLT '90

View full text Add to dashboard Cite

An interaction has been found between the true source language model, training language model, and the testing language model. This interaction has implications for vocabulary independent modeling, testing methodologies, discriminative training, and the adequacy of our current databases for continuous speech recognition (CSR) development. The current DARPA databases suffer from the described difficulties which suggests that new CSR databases are needed if we are to further advance the state-of-the-art. The Interaction During Training When a category model (e.g. a context-free (CF) model such as a monophone) is used to a model a set of subcategories (e.g. context-dependent (CD) models such as triphones), the category model becomes the subcategory prior-probability weighted average of the subcategory models: Meat E PsubeatMsubcat where M denotes a model. (The mathematics used here are intended to be conceptual rather than rigorous. Thus models will be considered to be averages. In practice, the method for deriving a model from a set of sub-models or observations is highly dependent upon the form of model used.) In a field, such as speech recognition, where models are trained from exemplars, the subcategory model will generally be: N 1 Msttbcat = ~ ~.= Osubeat,i where 08=bcat,i is an observation emitted from the subcategory. Mcat combines both the subcategory models and the prior-probability of the subcategories and similarly Msubcat combines the observations and their (sampled) prior-probabilities.

show abstract

“…1. The implementation of the word recognizer requires the specification of five processes (the first two of which are the ones used by IBM Speech Recognition Group at Yorktown [ 1]). These are, as follows.…”

Section: Structure Of the Hmm Based Recognizermentioning

confidence: 99%

Recognition of isolated Arabic digits using hidden Markov models

Emam,

Hashish

1987

European Conference on Speech Technology

View full text Add to dashboard Cite

In this paper, hidden Markov modeling technique is applied to Arabic speech recognition. In addition, vector quantization process of speech data is performed based on auditory modeling procedure. The work introduces the first results achieved in the initial development of an isolated -word recognizer based on vector quan tization and hidden Markov models for Arabic and its application to limited vocabulary represented by the ten Arabic digits. The proposed speech recognizer is designed to operate in real time on an IBM PC whose performance is enhanced by the IBM signal processor.

show abstract

An IBM PC based large-vocabulary isolated-utterance speech recognizer

Cited by 41 publications

References 14 publications

Estimation of probabilities from sparse data for the language model component of a speech recognizer

Estimation of probabilities from sparse data for the language model component of a speech recognizer

On the interaction between true source, training, and testing language models

Recognition of isolated Arabic digits using hidden Markov models

Contact Info

Product

Resources

About