2014
DOI: 10.1007/s11042-014-2183-z
|View full text |Cite
|
Sign up to set email alerts
|

Acoustic to articulatory mapping with deep neural network

Abstract: Synthetic talking avatar has been demonstrated to be very useful in humancomputer interactions. In this paper, we discuss the problem of acoustic to articulatory mapping and explore different kinds of models to describe the mapping function. We try general linear model (GLM), Gaussian mixture model (GMM), artificial neural network (ANN) and deep neural network (DNN) for the problem. Taking the advantage of neural network that its prediction stage can be finished in a very short time (e.g. real-time), we develo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
12
0

Year Published

2015
2015
2024
2024

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 21 publications
(12 citation statements)
references
References 29 publications
0
12
0
Order By: Relevance
“…The relation between the acoustic features and articulatory movements is known to be non-linear and non-unique [14,3].…”
Section: Proposed Approachmentioning
confidence: 99%
See 1 more Smart Citation
“…The relation between the acoustic features and articulatory movements is known to be non-linear and non-unique [14,3].…”
Section: Proposed Approachmentioning
confidence: 99%
“…There are several applications of AAI including speech recognition [2,3], speech synthesis [4], speaker verification [5] and multimedia applications [6,7,8]. For subject dependent AAI (SD-AAI), various approaches have been proposed in the literature including codebook [9,10], Gaussian mixture model (GMM) [11], Hidden Markov Model (HMM) [12], mixture trajectory model [13], Deep Neural Network (DNN) [14,15,16]. All these approaches need parallel acoustic-articulatory data for training AAI model, which, in turn, requires recording of speech and simultaneous motion of articulators from a subject of interest.…”
Section: Introductionmentioning
confidence: 99%
“…On the other hand, the deep neural network can extract more representative features from the raw data in a pre-training way for obtaining more accurate prediction results. Due to the superiority in feature extraction and model fitting, deep learning has attracted a great amount of attention around the world, and has been widely applied in various fields, such as green buildings [27,28], image processing [29][30][31][32], speech recognition [33,34], and intelligent traffic management systems [35][36][37]. As a novel deep learning method, the long-short-term memory network (LSTM) can make full use of the historical information due to its special structure [38].…”
Section: Introductionmentioning
confidence: 99%
“…After several rounds of review, ten papers were finally selected to be included in this special issue. These papers can be categorized into three topics: avatar animation [3,9,13], speech synthesis [11,12,16,18] and human emotion/behavior analysis [5,10,17].…”
mentioning
confidence: 99%
“…Significant performance gain in head motion prediction is reported. Taking the advantage of the rich non-linear learning ability, Wu et al [13] develop a DNN approach for real-time speech driven talking avatar. Specifically, the input of the system is acoustic speech and the output is articulatory movements on a three-dimensional avatar.…”
mentioning
confidence: 99%