In this paper we show how a discriminative objective function such as Maximum Mutual Information (MMI) can be combined with a prior distribution over the HMM parameters to give a discriminative Maximum A Posteriori (MAP) estimate for HMM training. The prior distribution can be based around the Maximum Likelihood (ML) parameter estimates, leading to a technique previously referred to as I-smoothing; or for adaptation it can be based around a MAP estimate of the ML parameters, leading to what we call MMI-MAP. This latter approach is shown to be effective for task adaptation, where data from one task (Voicemail) is used to adapt a HMM set trained on another task (Switchboard). It is shown that MMI-MAP results in a 2.1% absolute reduction in word error rate relative to standard ML-MAP with 30 hours of Voicemail task adaptation data starting from a MMI-trained Switchboard system.
Abstract-Recently there has been interest in combined generative/discriminative classifiers. In these classifiers features for the discriminative models are derived from generative kernels. One advantage of using generative kernels is that systematic approaches exist how to introduce complex dependencies beyond conditional independence assumptions. Furthermore, by using generative kernels model-based compensation/adaptation techniques can be applied to make discriminative models robust to noise/speaker conditions. This paper extends previous work with combined generative/discriminative classifiers in several directions. First, it introduces derivative kernels based on contextdependent generative models. Second, it describes how derivative kernels can be incorporated in continuous discriminative models. Third, it addresses the issues associated with large number of classes and parameters when context-dependent models and highdimensional features of derivative kernels are used. The approach is evaluated on two noise-corrupted tasks: small vocabulary AURORA 2 and medium-to-large vocabulary AURORA 4 task.
In previous papers the use of Parallel Model Combination (PMC) for noise robustness has been described. Various fast implementations have been proposed, though to date in order to compensate all the parameters of a system it has been necessary to perform Gaussian integration. This pa per introduces an alternative method that can compensate all the parameters of the recognition system, whilst reduc ing the computational load of this task. Furthermore, the technique offers an additional degree of flexibility, as it al lows the number of components to be chosen and optimised using standard iterative techniques. The new technique is referred to as Data-driven PMC (DPMC). It is evaluated on the Resource Management database, with noise artifi cially added from the NOISEX-92 database. The perfor mance of DPMC is found to be comparable to PMC , at a far lower computational cost. In complex noise environ ments, by more accurately modelling the noise source, us ing multiple components, and then reducing the number of components to the original nnmber a slight improvement in performance is obtained.1.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.