Connected digit recognition using statistical template matching

Welling, L.; Ney, Hermann; Eiden, A.; Forbrig, C.

doi:10.21437/eurospeech.1995-363

Cited by 12 publications

(2 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The baseline recognizer applies ML training using the Viterbi approximation in combination with an optional LDA. A detailed description of the baseline system can be found in [11]. The word error rates obtained with the baseline system for the combined recognition of both genders are summarized in Table 2 (0 tangent vectors (tv) per mixture (mix)).…”

Section: Resultsmentioning

confidence: 99%

Improving automatic speech recognition using tangent distance

Macherey

Keysers²,

Dahmen³

et al. 2001

7th European Conference on Speech Communication and Technology (Eurospeech 2001)

Self Cite

View full text Add to dashboard Cite

In this paper we present a new approach to variance modelling in automatic speech recognition (ASR) that is based on tangent distance (TD). Using TD, classifiers can be made invariant w.r.t. small transformations of the data. Such transformations generate a manifold in a high dimensional feature space when applied to an observation vector. While conventional classifiers determine the distance between an observation and a prototype vector, TD approximates the minimum distance between their manifolds, resulting in classification that is invariant w.r.t. the underlying transformation. Recently, this approach was successfully applied in image object recognition. In this paper we describe how TD can be incorporated into ASR systems based on Gaussian mixture densities (GMD). The proposed method is embedded into a probabilistic framework. Experiments performed on the SieTill corpus for telephone line recorded German digit strings show a significant improvement in comparison with a conventional GMD approach using a comparable amount of model parameters.

show abstract

Section: Resultsmentioning

confidence: 99%

Improving automatic speech recognition using tangent distance

Macherey

Keysers²,

Dahmen³

et al. 2001

7th European Conference on Speech Communication and Technology (Eurospeech 2001)

Self Cite

View full text Add to dashboard Cite

show abstract

“…The baseline recognizer applies ML training using the Viterbi approximation which serves as a starting point for the additional discriminative training. A detailed description of the baseline system could be found in [9].…”

Section: Resultsmentioning

confidence: 99%

A language-independent personal voice controller with embedded speaker verification

Schlüter¹,

Macherey²,

Müller³

et al. 1999

6th European Conference on Speech Communication and Technology

Self Cite

View full text Add to dashboard Cite

In this work a method for splitting continuous mixture density hidden Markov models (HMM) is presented. The approach combines a model evaluation measure based on the Maximum Mutual Information (MMI) criterion with subsequent standard Maximum Likelihood (ML) training of the HMM parameters. Experiments were performed on the SieTill corpus for telephone line recorded German continuous digit strings. The proposed splitting approach performed better than discriminative training with conventional splitting and as good as discriminative training after the new splitting approach.

show abstract

Comparison of discriminative training criteria and optimization methods for speech recognition

Schlüter

Macherey

Müller

et al. 2001

Speech Communication

View full text Add to dashboard Cite

The aim of this work is to build up a common framework for a class of discriminative training criteria and optimization methods for continuous speech recognition. A uni®ed discriminative criterion based on likelihood ratios of correct and competing models with optional smoothing is presented. The uni®ed criterion leads to particular criteria through the choice of competing word sequences and the choice of smoothing. Analytic and experimental comparisons are presented for both the maximum mutual information (MMI) and the minimum classi®cation error (MCE) criterion together with the optimization methods gradient descent (GD) and extended Baum (EB) algorithm. A tree search-based restricted recognition method using word graphs is presented, so as to reduce the computational complexity of large vocabulary discriminative training. Moreover, for MCE training, a method using word graphs for e cient calculation of discriminative statistics is introduced. Experiments were performed for continuous speech recognition using the ARPA wall street journal (WSJ) corpus with a vocabulary of 5k words and for the recognition of continuously spoken digit strings using both the TI digit string corpus for American English digits, and the SieTill corpus for telephone line recorded German digits. For the MMI criterion, neither analytical nor experimental results do indicate signi®cant di erences between EB and GD optimization. For acoustic models of low complexity, MCE training gave signi®cantly better results than MMI training. The recognition results for large vocabulary MMI training on the WSJ corpus show a signi®cant dependence on the context length of the language model used for training. Best results were obtained using a unigram language model for MMI training. No signi®cant correlation has been observed between the language models chosen for training and recognition. Ó 2001 Elsevier Science B.V. All rights reserved. ZusammenfassungZiel dieser Arbeit ist die Scha ung eines einheitlichen Rahmens fur eine Klasse von diskriminativen Trainingskriterien und Optimierungsmethoden fur die kontinuierliche Spracherkennung. Dazu wird ein einheitliches Kriterium de®niert, das auf Wahrscheinlichkeitsverhaltnissen von korrekten und konkurrierenden Modellen basiert. Spezielle Kriterien ergeben sich daraus durch die Wahl der konkurrierenden Wortfolgen sowie der Glattung. Fur die Kriterien maximum mutual information (MMI) und minimum classi®cation error (MCE), sowie deren Optimierung mittels Gradientenabstieg (GD) und erweitertem Baum ( ResumeLe but de ce travail est de de®nir un cadre commun incluant un ensemble de criteres d'apprentissage discriminant et de methodes d'optimisation pour la reconnaissance de la parole continue. Nous introduisons un critere discriminant fonde sur le rapport entre la vraissemblance des modeles corrects et concurrents. Ce critere general conduit a de®nir des criteres speci®ques par le choix des sequences de mots en concurrence et par celui de la methode de lissage. Des comparaisons analytiques et experiment...

show abstract

Connected digit recognition using statistical template matching

Cited by 12 publications

References 4 publications

Improving automatic speech recognition using tangent distance

Improving automatic speech recognition using tangent distance

A language-independent personal voice controller with embedded speaker verification

Comparison of discriminative training criteria and optimization methods for speech recognition

Contact Info

Product

Resources

About