The Application of Hidden Markov Models in Speech Recognition

Gales, Mark J. F.; Young, Steve

doi:10.1561/2000000004

Cited by 439 publications

(176 citation statements)

References 140 publications

(210 reference statements)

Supporting

Mentioning

174

Contrasting

Unclassified

Order By: Relevance

“…The phrase modeling in this application is done by a whole-phrase continuous HMM [2,10,27]. The selected model is with a left-to-right topology with no skip state and the output distributions are represented as mixture of Gaussians with diagonal covariance matrices.…”

Section: Hmm Speaker Verificationmentioning

confidence: 99%

“…The selected model is with a left-to-right topology with no skip state and the output distributions are represented as mixture of Gaussians with diagonal covariance matrices. The HMM training is carried out by well-known Baum-Welch Algorithm [10,27]. In the verification are used the individual speaker's thresholds.…”

Section: Hmm Speaker Verificationmentioning

confidence: 99%

“…In the second experiment the performance of the endpoints detection algorithms in terms of the recognition rate is estimated via two fixed-text speaker verification applications. The first application is based on the Dynamic Time Warping (DTW) algorithm [21] while the second one uses the left-to-right HMM paradigm [10]. The verification results are compared to these obtained by the manual endpoint detection.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

LTSD and GDMD features for Telephone Speech Endpoint Detection

Ouzounov

2017

Cybernetics and Information Technologies

View full text Add to dashboard Cite

show abstract

Section: Hmm Speaker Verificationmentioning

confidence: 99%

Section: Hmm Speaker Verificationmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

LTSD and GDMD features for Telephone Speech Endpoint Detection

Ouzounov

2017

Cybernetics and Information Technologies

View full text Add to dashboard Cite

show abstract

“…Each context-dependent unit is typically represented by a hidden Markov model (HMM) with Gaussian mixture observation densities, which account for the remaining acoustic variation among different instances of the same unit. For further details about the architecture of standard HMM-based recognizers, see [4]. 1 Linguists distinguish phones-acoustic realizations of speech sounds-from phonemes-abstract sound units, each possibly corresponding to multiple phones, such that a change in a single phoneme can change a word's identity.…”

Section: B Phones and Context-dependent Phonesmentioning

confidence: 99%

“…Typical sub-phonetic features are articulatory features, which may be binary or multivalued and characterize in some way the configuration of the vocal tract. 4 Roughly 80% of phonetic substitutions of consonants in the Switchboard Transcription Project data consist of a single articulatory feature change [10]. In addition, effects such as nasalization, rounding, and stop consonant epenthesis can be the result of asynchrony between articulatory trajectories [50].…”

Section: Sub-phonetic Feature Modelsmentioning

confidence: 99%

Subword Modeling for Automatic Speech Recognition: Past, Present, and Emerging Approaches

Livescu¹,

Fosler‐Lussier

Metze

2012

IEEE Signal Process. Mag.

View full text Add to dashboard Cite

Abstract-Modern automatic speech recognition systems handle large vocabularies of words, making it infeasible to collect enough repetitions of each word to train individual word models. Instead, large-vocabulary recognizers represent each word in terms of sub-word units. Typically the sub-word unit is the phone, a basic speech sound such as a single consonant or vowel. Each word is then represented as a sequence, or several alternative sequences, of phones specified in a pronunciation dictionary. Other choices of sub-word units have been studied as well. The choice of sub-word units, and the way in which the recognizer represents words in terms of combinations of those units, is the problem of sub-word modeling. Different sub-word models may be preferable in different settings, such as high-variability conversational speech, high-noise conditions, low-resource settings, or multilingual speech recognition. This article reviews past, present, and emerging approaches to sub-word modeling. In order to make clean comparisons between many approaches, the review uses the unifying language of graphical models.

show abstract