A method for estimating the parameters of hidden Markov models of speech is described. Parameter values are chosen to maximize the mutual information between an acoustic observation sequence and the corresponding word sequence.Recognition results are presented comparing this method with maximum likelihood estimation.
Using counterexamples, we show that vocabulary size and static and dynamic branching factors are all inadequate as measures of speech recognition complexity of finite state grammars. Information theoretic arguments show that perplexity (the logarithm of which is the familiar entropy) is a more appropriate measure of equivalent choice. It too has certain weaknesses which we discuss. We show that perplexity can also be applied to languages having no obvious statistical description, since an entropy-maximizing probability assignment can be found for any finite-state grammar. Table I shows perplexity values for some well-known speech recognition tasks.
Perplexity Vocabulary Dynamic
Phone Word size branching factor
IBM-Lasers 2.14 21.11 1000 1000
IBM-Raleigh 1.69 7.74 250 7.32
CMU-AIX05 1.52 6.41 1011 35
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.