The aim of this work is to build up a common framework for a class of discriminative training criteria and optimization methods for continuous speech recognition. A uni®ed discriminative criterion based on likelihood ratios of correct and competing models with optional smoothing is presented. The uni®ed criterion leads to particular criteria through the choice of competing word sequences and the choice of smoothing. Analytic and experimental comparisons are presented for both the maximum mutual information (MMI) and the minimum classi®cation error (MCE) criterion together with the optimization methods gradient descent (GD) and extended Baum (EB) algorithm. A tree search-based restricted recognition method using word graphs is presented, so as to reduce the computational complexity of large vocabulary discriminative training. Moreover, for MCE training, a method using word graphs for e cient calculation of discriminative statistics is introduced. Experiments were performed for continuous speech recognition using the ARPA wall street journal (WSJ) corpus with a vocabulary of 5k words and for the recognition of continuously spoken digit strings using both the TI digit string corpus for American English digits, and the SieTill corpus for telephone line recorded German digits. For the MMI criterion, neither analytical nor experimental results do indicate signi®cant di erences between EB and GD optimization. For acoustic models of low complexity, MCE training gave signi®cantly better results than MMI training. The recognition results for large vocabulary MMI training on the WSJ corpus show a signi®cant dependence on the context length of the language model used for training. Best results were obtained using a unigram language model for MMI training. No signi®cant correlation has been observed between the language models chosen for training and recognition. Ó 2001 Elsevier Science B.V. All rights reserved.
ZusammenfassungZiel dieser Arbeit ist die Scha ung eines einheitlichen Rahmens fur eine Klasse von diskriminativen Trainingskriterien und Optimierungsmethoden fur die kontinuierliche Spracherkennung. Dazu wird ein einheitliches Kriterium de®niert, das auf Wahrscheinlichkeitsverhaltnissen von korrekten und konkurrierenden Modellen basiert. Spezielle Kriterien ergeben sich daraus durch die Wahl der konkurrierenden Wortfolgen sowie der Glattung. Fur die Kriterien maximum mutual information (MMI) und minimum classi®cation error (MCE), sowie deren Optimierung mittels Gradientenabstieg (GD) und erweitertem Baum (
ResumeLe but de ce travail est de de®nir un cadre commun incluant un ensemble de criteres d'apprentissage discriminant et de methodes d'optimisation pour la reconnaissance de la parole continue. Nous introduisons un critere discriminant fonde sur le rapport entre la vraissemblance des modeles corrects et concurrents. Ce critere general conduit a de®nir des criteres speci®ques par le choix des sequences de mots en concurrence et par celui de la methode de lissage. Des comparaisons analytiques et experiment...