2016
DOI: 10.1016/j.csl.2015.05.005
|View full text |Cite
|
Sign up to set email alerts
|

Integrating articulatory data in deep neural network-based acoustic modeling

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
31
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 39 publications
(31 citation statements)
references
References 28 publications
0
31
0
Order By: Relevance
“…Again, we check whether this improvement could be due solely to the extra acoustic data, by training a similar model on only the acoustic input; the result (row 9) is worse, indicating that our improvements are not due to the extra acoustics alone. Row 10 corresponds to a single recognizer trained on the merged acoustic data of XRMB and TIMIT; this model does surprisingly well, but still 3 In this case we use VCCAP with a 71-frame window acoustic input. Table 2: PER (%) for XRMB→TIMIT.…”
Section: Xrmb → Timitmentioning
confidence: 99%
“…Again, we check whether this improvement could be due solely to the extra acoustic data, by training a similar model on only the acoustic input; the result (row 9) is worse, indicating that our improvements are not due to the extra acoustics alone. Row 10 corresponds to a single recognizer trained on the merged acoustic data of XRMB and TIMIT; this model does surprisingly well, but still 3 In this case we use VCCAP with a 71-frame window acoustic input. Table 2: PER (%) for XRMB→TIMIT.…”
Section: Xrmb → Timitmentioning
confidence: 99%
“…The input to the DNNs consisted of 5 concatenated MFCC vectors (a context size used in all our previous work [3]) used to estimate a vector of 16 AFs. The 39 MFCCs were previously normalized to have 0 mean and 1 standard deviation.…”
Section: Dnns Stl-and Mtl-based Trainingmentioning
confidence: 99%
“…Measured vocal tract movements, i.e., articulatory features (AFs), can be beneficial for several speech technology applications, including speech synthesis [1], automatic speech recognition (ASR) [2,3], pronunciation training [4] and speech-driven computer animation [5]. Techniques for measuring AFs range from electromagnetic articulography (EMA) to ultrasound and functional magnetic resonance imaging (fMRI).…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…In AAI, the objective is to estimate the vocal tract shape, which is estimated by the articulator positions based on the uttered speech. AAI can be useful in many speech-based applications, in particular, speech synthesis [1], automatic speech recognition (ASR) [2,3,4] and second language learning [5,6]. Over the years, researchers have addressed this problem employing various machine learning techniques including codebooks [7], Gaussian mixture models (GMM) [8], hidden Markov models (HMM) [9], mixture density networks [10], deep neural networks (DNNs) [11,12,13], and deep recurrent neural networks (RNNs) [14,15,16].…”
Section: Introductionmentioning
confidence: 99%