[Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing 1991
DOI: 10.1109/icassp.1991.150383
|View full text |Cite
|
Sign up to set email alerts
|

Application of neural networks to articulatory motion estimation

Abstract: A B S T R A C TThe first five parameters the shape of the oral and pha This paper discusses an application of neural networks to the problem of estimating the motion of articulatory organs from speech waves.Recently, neural networks (NN) have been studied extensively. It has been proved that a three-or four-layer feed-forwardIn this paper, we apply this feature of NN to the articulatory parameter estimation problem.The evaluation test is performed using the vowel data in 5201) tokens in the ATR word database. … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
7
0

Year Published

1992
1992
2010
2010

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 11 publications
(7 citation statements)
references
References 3 publications
0
7
0
Order By: Relevance
“…Shirai et al [101] proposed an analysis-by-synthesis approach, which they termed as “Model Matching,” where speech was analyzed to generate articulatory information and then the output was processed by a speech synthesizer such that it had minimal distance from the actual speech signal in the spectral domain. Kobayashi et al [64] proposed a feed-forward MLP architecture with two hidden layers that uses the same data as used in [101] to predict the articulatory parameters and showed faster performance and better estimation accuracy. Regression techniques have been explored a number of times for speech inversion.…”
Section: Introductionmentioning
confidence: 99%
“…Shirai et al [101] proposed an analysis-by-synthesis approach, which they termed as “Model Matching,” where speech was analyzed to generate articulatory information and then the output was processed by a speech synthesizer such that it had minimal distance from the actual speech signal in the spectral domain. Kobayashi et al [64] proposed a feed-forward MLP architecture with two hidden layers that uses the same data as used in [101] to predict the articulatory parameters and showed faster performance and better estimation accuracy. Regression techniques have been explored a number of times for speech inversion.…”
Section: Introductionmentioning
confidence: 99%
“…Previous attempts to recover articulatory movement from the speech signal involved building a mapping from the acoustic domain to the articulatory domain, either manually or constructed automatically from parallel data [4], [5], [6], [7], [8], [9], [10], [11], [12]. Variations of neural networks [5], [13], [6], [11] have become popular in the latter category. Often the inversion system is built separately from the recognition framework, particularly because the slowly varying nature of articulation may be best modelled in a different way to speech acoustics which change more rapidly, and are noisier.…”
Section: Introductionmentioning
confidence: 99%
“…Nonlinear mapping of two different observation spaces is of great interest for both theoretical and practical purposes. In the area of speech processing, nonlinear mapping has been applied to noise enhancement [1,32], articulatory motion estimation [29,18], and speech recognition [16]. Neural networks have been used successfully to transform data of a new speaker to a reference speaker for speaker-adaptive speech recognition [11].…”
Section: Introductionmentioning
confidence: 99%