Application of neural networks to articulatory motion estimation

Kobayashi, Tetsuo; Yagyu, Mitsuhiko; Shirai, Katsuhiko

doi:10.1109/icassp.1991.150383

Cited by 11 publications

(7 citation statements)

References 3 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Shirai et al [101] proposed an analysis-by-synthesis approach, which they termed as “Model Matching,” where speech was analyzed to generate articulatory information and then the output was processed by a speech synthesizer such that it had minimal distance from the actual speech signal in the spectral domain. Kobayashi et al [64] proposed a feed-forward MLP architecture with two hidden layers that uses the same data as used in [101] to predict the articulatory parameters and showed faster performance and better estimation accuracy. Regression techniques have been explored a number of times for speech inversion.…”

Section: Introductionmentioning

confidence: 99%

Retrieving Tract Variables From Acoustics: A Comparison of Different Machine Learning Strategies

Mitra

Nam

Espy‐Wilson

et al. 2010

IEEE J. Sel. Top. Signal Process.

View full text Add to dashboard Cite

Many different studies have claimed that articulatory information can be used to improve the performance of automatic speech recognition systems. Unfortunately, such articulatory information is not readily available in typical speaker-listener situations. Consequently, such information has to be estimated from the acoustic signal in a process which is usually termed “speech-inversion.” This study aims to propose and compare various machine learning strategies for speech inversion: Trajectory mixture density networks (TMDNs), feedforward artificial neural networks (FF-ANN), support vector regression (SVR), autoregressive artificial neural network (AR-ANN), and distal supervised learning (DSL). Further, using a database generated by the Haskins Laboratories speech production model, we test the claim that information regarding constrictions produced by the distinct organs of the vocal tract (vocal tract variables) is superior to flesh-point information (articulatory pellet trajectories) for the inversion process.

show abstract

Section: Introductionmentioning

confidence: 99%

Retrieving Tract Variables From Acoustics: A Comparison of Different Machine Learning Strategies

Mitra

Nam

Espy‐Wilson

et al. 2010

IEEE J. Sel. Top. Signal Process.

View full text Add to dashboard Cite

show abstract

“…Previous attempts to recover articulatory movement from the speech signal involved building a mapping from the acoustic domain to the articulatory domain, either manually or constructed automatically from parallel data [4], [5], [6], [7], [8], [9], [10], [11], [12]. Variations of neural networks [5], [13], [6], [11] have become popular in the latter category. Often the inversion system is built separately from the recognition framework, particularly because the slowly varying nature of articulation may be best modelled in a different way to speech acoustics which change more rapidly, and are noisier.…”

Section: Introductionmentioning

confidence: 99%

Acoustic-Articulatory Modeling With the Trajectory HMM

Zhang

Renals

2008

IEEE Signal Process. Lett.

View full text Add to dashboard Cite

In this letter, we introduce an HMM-based inversion system to recovery articulatory movements from speech acoustics. Trajectory HMMs are used as generative models for modelling articulatory data. Experiments on the MOCHA-TIMIT corpus indicate that the jointly trained acoustic-articulatory models are more accurate (lower RMS error) than the separately trained ones, and that trajectory HMM training results in greater accuracy compared with conventional maximum likelihood HMM training. Moreover, the system has the ability to synthesise articulatory movements directly from a textual representation.

show abstract

“…Nonlinear mapping of two different observation spaces is of great interest for both theoretical and practical purposes. In the area of speech processing, nonlinear mapping has been applied to noise enhancement [1,32], articulatory motion estimation [29,18], and speech recognition [16]. Neural networks have been used successfully to transform data of a new speaker to a reference speaker for speaker-adaptive speech recognition [11].…”

Section: Introductionmentioning

confidence: 99%

Minimizing speaker variation effects for speaker-independent speech recognition

Huang

1992

Proceedings of the Workshop on Speech and Natural Language - HLT '91

View full text Add to dashboard Cite

For speaker-independent speech recognition, speaker variation is one of the major error sources. In this paper, a speaker-independentnormalization network is constructed such that speaker variation effects can be minimized. To achieve this goal, multiple speaker clusters are constructed from the speaker-independent training database. A codeword-dependent neural network is associated with each speaker cluster. The cluster that contains the largest number of speakers is designated as the golden cluster. The objective function is to minimize distortions between acoustic data in each cluster and the golden speakercluster. Performanceevaluation showedthat speakernormalized front-end reduced the error rate by 15% for the DARPA resource management speaker-independent speech recognition task.

show abstract

Application of neural networks to articulatory motion estimation

Cited by 11 publications

References 3 publications

Retrieving Tract Variables From Acoustics: A Comparison of Different Machine Learning Strategies

Retrieving Tract Variables From Acoustics: A Comparison of Different Machine Learning Strategies

Acoustic-Articulatory Modeling With the Trajectory HMM

Minimizing speaker variation effects for speaker-independent speech recognition

Contact Info

Product

Resources

About