IEEE International Conference on Acoustics Speech and Signal Processing 2002
DOI: 10.1109/icassp.2002.5743665
|View full text |Cite
|
Sign up to set email alerts
|

Minimum Phone Error and I-smoothing for improved discriminative training

Abstract: In this paper we introduce the Minimum Phone Error (MPE) and Minimum Word Error (MWE) criteria for the discriminative training of HMM systems. The MPE/MWE criteria are smoothed approximations to the phone or word error rate respectively. We also discuss I-smoothing which is a novel technique for smoothing discriminative training criteria using statistics for maximum likelihood estimation (MLE). Experiments have been performed on the Switchboard/Call Home corpora of telephone conversations with up to 265 hours … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

3
353
0

Year Published

2006
2006
2012
2012

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 425 publications
(356 citation statements)
references
References 6 publications
3
353
0
Order By: Relevance
“…Both systems extract Mel frequency cepstral coefficients (MFCCs) with standard normalization and adaptation techniques: cepstral vocal tract length normalization (VTLN), heteroscedastic linear discriminant analysis (HLDA), cepstral mean and variance normalization, and maximum likelihood linear regression (MLLR). Both systems have gender-dependent acoustic models trained discriminatively using variants of minimum phone error (MPE) training with maximum mutual information (MMI) priors (Povey and Woodland, 2002). Both train their acoustic models on approximately 2400 hours of conversational telephone speech from the Switchboard, CallHome and Fisher corpora, consisting of 360 hours of speech used in the 2003 evaluation plus 1820 hours of noisy "quick transcriptions" from the Fisher corpus, although with different segmentation and filtering.…”
Section: Datamentioning
confidence: 99%
“…Both systems extract Mel frequency cepstral coefficients (MFCCs) with standard normalization and adaptation techniques: cepstral vocal tract length normalization (VTLN), heteroscedastic linear discriminant analysis (HLDA), cepstral mean and variance normalization, and maximum likelihood linear regression (MLLR). Both systems have gender-dependent acoustic models trained discriminatively using variants of minimum phone error (MPE) training with maximum mutual information (MMI) priors (Povey and Woodland, 2002). Both train their acoustic models on approximately 2400 hours of conversational telephone speech from the Switchboard, CallHome and Fisher corpora, consisting of 360 hours of speech used in the 2003 evaluation plus 1820 hours of noisy "quick transcriptions" from the Fisher corpus, although with different segmentation and filtering.…”
Section: Datamentioning
confidence: 99%
“…It works in the same manner as the standard MAP, only the input HMM has to be discriminatively trained with the same objective function. For discriminative adaptation it is strongly recommended to use I-smoothing method to boost stability of new estimates [13].…”
Section: Frame-discriminative Adaptationmentioning
confidence: 99%
“…To improve the generality of MBE training, the I-smoothing technique [10] is employed to provide better parameter estimates. This technique can be regarded as interpolating the MBE and ML auxiliary functions according to the amount of data available for each Gaussian mixture …”
Section: I-smoothing Updatementioning
confidence: 99%
“…The extended EM algorithm, which utilizes a weak-sense auxiliary function [9] and has been applied in the minimum phone error (MPE) discriminative training approach [10] for ASR, can be adapted to solve Eq.(6). …”
Section: Objective Function Optimization and Update Formulaementioning
confidence: 99%