“…Most applications of discriminative training methods for speech recognition use either the maximum mutual information (MMI) (Bahl et al, 1986;Brown, 1987;Cardin et al, 1993;Chow, 1990;Kapadia et al, 1993;Normandin, 1996;Normandin et al, 1994a,b;Normandin and Morgera, 1991;Reichl and Ruske, 1995;Valtchev et al, 1996Valtchev et al, , 1997 or the minimum classi®cation error (MCE) (Chou et al, 1992(Chou et al, , 1993(Chou et al, , 1994Paliwal et al, 1995;Reichl and Ruske, 1995) criterion. In MCE training, an approximation to the error rate on the training data is optimized, whereas MMI training optimizes the a posteriori probability of the training utterances and hence the class separability.…”