1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258) 1999
DOI: 10.1109/icassp.1999.758061
|View full text |Cite
|
Sign up to set email alerts
|

The 1998 HTK system for transcription of conversational telephone speech

Abstract: This paper describes the 1998 HTK large vocabulary speech recognition system for conversational telephone speech as used in the NIST 1998 HubSE evaluation. Front-end and language modelling experiments conducted using various training and test sets from both the Switchboard and Callhome English corpora are presented. Our complete system includes reduced bandwidth analysis, sidebased cepstral feature normalisation, vocal tract length normalisation (VTLN), triphone and quinphone hidden Markov models (HMMs) built … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
39
0
3

Year Published

1999
1999
2010
2010

Publication Types

Select...
3
2
2

Relationship

1
6

Authors

Journals

citations
Cited by 50 publications
(42 citation statements)
references
References 6 publications
0
39
0
3
Order By: Relevance
“…The most popular VTLN technique performs speaker-specific piecewise linear frequency scaling of the Mel-Frequency Cepstral Coefficients (MFCCs) [3]. The overall improvement in the word error rate (WER) obtained with this technique is usually on the order of 0.6% as compared to results obtained without VTLN.…”
Section: Introductionmentioning
confidence: 99%
“…The most popular VTLN technique performs speaker-specific piecewise linear frequency scaling of the Mel-Frequency Cepstral Coefficients (MFCCs) [3]. The overall improvement in the word error rate (WER) obtained with this technique is usually on the order of 0.6% as compared to results obtained without VTLN.…”
Section: Introductionmentioning
confidence: 99%
“…Acoustic models are phonetic decision tree state clustered triphone models with standard left-to-right 3-state topology. They were obtained using standard HTKmaximum likelihood training procedures (see for example [11]). The system uses approximately 7000 states where each state is represented as a mixture of 16 Gaussians.…”
Section: Acoustic Modellingmentioning
confidence: 99%
“…Speaker adaptive training is performed in the form of vocal tract length normalisation (VTLN) both in training and test. Warp factors are estimated using a parabolic search procedure, a piecewise linear warping function and a maximum likelihood criterion [11]. Speaker adaptation is perfermed using maximum likelihood linear regression (MLLR) of the means and variances [8].…”
Section: Acoustic Modellingmentioning
confidence: 99%
See 1 more Smart Citation
“…Within this category we find techniques such as RASTA-PLP (Hermansky and Morgan (1994)), CMN (Cepstral Mean Normalisation) (Furui (1981)), SCMN (Segmental Cepstral Mean Normalisation) (Viikki and Laurila (1998)), VTLN (Vocal Tract Length Normalisation) (Hain et al (1999)) or histogram equalization (de la Torre et al (2005)). …”
Section: Introductionmentioning
confidence: 99%