o r g e r a e r c h e informatique de M o n t r t a l (CRIM) MontrCal, C o m m u n i c a t i o n s Systems GroupABSTRACT In several speech recognition tasks, Maximum Mutual Information estimation (MMIE) of Hidden Markov Model (HMM) parameters can substantially improve recognition results [1,2]. However, it I S usually implemented using ally Practical rec gradient descent, which can have very slow convergence. Recently, Gopalakrishnan et nl [3] intr a reestimation More recently, a different formula for discrete HMMs which s to rational mutual information estimation (M objcctive functions (like t!ie M M I E c 1. We analyze 151. T h e objective function used in the formula and show how its convergence rate can be substantially improved. We introduce our "corrective MMIE training" algorithm, which, when applied to the TI/NIST connected digit database, has allowed us to reduce our string error rate by close to 50%. We extend Gopalakrishnan's result to the continuous case by proposing a new formula for estimating the mean and variance parameters of diagonal Gaussian densities. not intuitively obvious how (1) relates t reducing the error rate. It can be shown assumPtions are met, decoder. However, thes MMIE increases the posteriori proba is also the criterion used in '
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.