Improving on hidden Markov models: An articulatorily constrained, maximum likelihood approach to speech recognition and speech coding

Hogden, John

doi:10.2172/431136

Cited by 6 publications

(5 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It was observed that smoothing the estimated articulatory trajectories improved estimation quality and the correlation and reduced the RMSE. This is a direct consequence of the observation made in [52], which claimed that articulatory motions are predominantly low pass in nature with a cutoff frequency of 15 Hz. This led us to introduce a Kalman smoother-based postprocessor in the architectures discussed above.…”

Section: Machine Learning Approaches For Speech Inversionmentioning

confidence: 55%

See 1 more Smart Citation

Retrieving Tract Variables From Acoustics: A Comparison of Different Machine Learning Strategies

Mitra

Nam

Espy‐Wilson

et al. 2010

IEEE J. Sel. Top. Signal Process.

View full text Add to dashboard Cite

Many different studies have claimed that articulatory information can be used to improve the performance of automatic speech recognition systems. Unfortunately, such articulatory information is not readily available in typical speaker-listener situations. Consequently, such information has to be estimated from the acoustic signal in a process which is usually termed “speech-inversion.” This study aims to propose and compare various machine learning strategies for speech inversion: Trajectory mixture density networks (TMDNs), feedforward artificial neural networks (FF-ANN), support vector regression (SVR), autoregressive artificial neural network (AR-ANN), and distal supervised learning (DSL). Further, using a database generated by the Haskins Laboratories speech production model, we test the claim that information regarding constrictions produced by the distinct organs of the vocal tract (vocal tract variables) is superior to flesh-point information (articulatory pellet trajectories) for the inversion process.

show abstract

Section: Machine Learning Approaches For Speech Inversionmentioning

confidence: 55%

“…Human articulator movements are predominantly low pass in nature [52] and the articulatory trajectories usually have a smoother path, defined by one that does not have any Fourier components over the cutoff frequency of 15 Hz. Nonlinear AR-ANN shown in Fig.…”

Section: Machine Learning Approaches For Speech Inversionmentioning

confidence: 99%

Retrieving Tract Variables From Acoustics: A Comparison of Different Machine Learning Strategies

Mitra

Nam

Espy‐Wilson

et al. 2010

IEEE J. Sel. Top. Signal Process.

View full text Add to dashboard Cite

show abstract

“…As described in other documents (Hogden, 1996), MALCOM is an algorithm that can be used to estimate the probability of a sequence of categorical data. MALCOM can also be applied to speech (and other real valued sequences) if windows of the speech are first categorized using a technique such as vector quantization (Gray, 1984).…”

Section: Cmentioning

confidence: 99%

MALCOM X: Combining maximum likelihood continuity mapping with Gaussian mixture models

Hogden¹,

Scovel²

1998

Self Cite

View full text Add to dashboard Cite

“…These properties me not unique to HMMs, however. Maximum Likelihood Continuity Mapping (MALCOM), outlined below, is also a stochastic model with learning rules that allow training on large volumes of data (1). The main difference between MALCOM models and HMMs is that the model underlying MALCOM is intended to incorporate important constraints on articulator motions, and therefore better reflect the generative processes underlying speech.…”

Section: Introductionmentioning

confidence: 99%

Stochastic word models for articulatorily constrained speech recognition and synthesis

Hogden

Nix

Gracco

et al. 1998

The Journal of the Acoustical Society of America

View full text Add to dashboard Cite

Stochastic word models based on hidden Markov modeling (HMM) techniques are commonly used for speech recognition and are increasingly being used for speech synthesis. An alternate technique for generating stochastic word models based on Maximum Likelihood Continuity Mapping m~OM) will be presented. MALCOM generates a stochastic model of speech assuming 1) that speech sounds are periodically emitted as a point moves smoothly through a low-dimensional space called a continuity map (CM), and 2) that the sound emitted at time t is a probabilistic function of the position of the point at time t. The assumptions underlying MALCOM are intended to mimic speech production in that 1) speech sounds are produced as the articulators move slowly through a low-dimensional articulator space, and 2) the speech sound produced at time t is a function of the articulator positions at time t. We will show how to find CM trajectories (analogous to articulator trajectories) corresponding to a known sequence of phonemes. A preliminary test of the theory will be presented, in which articulator trajectories estimated from phonetic transcriptions of spoken words and the temporal positions of the phoneme centers, are compared to measured articulator trajectories.

show abstract

Improving on hidden Markov models: An articulatorily constrained, maximum likelihood approach to speech recognition and speech coding

Cited by 6 publications

References 23 publications

Retrieving Tract Variables From Acoustics: A Comparison of Different Machine Learning Strategies

Retrieving Tract Variables From Acoustics: A Comparison of Different Machine Learning Strategies

MALCOM X: Combining maximum likelihood continuity mapping with Gaussian mixture models

Stochastic word models for articulatorily constrained speech recognition and synthesis

Contact Info

Product

Resources

About