2019 International Joint Conference on Neural Networks (IJCNN) 2019
DOI: 10.1109/ijcnn.2019.8852153
|View full text |Cite
|
Sign up to set email alerts
|

Autoencoder-Based Articulatory-to-Acoustic Mapping for Ultrasound Silent Speech Interfaces

Abstract: When using ultrasound video as input, Deep Neural Network-based Silent Speech Interfaces usually rely on the whole image to estimate the spectral parameters required for the speech synthesis step. Although this approach is quite straightforward, and it permits the synthesis of understandable speech, it has several disadvantages as well. Besides the inability to capture the relations between close regions (i.e. pixels) of the image, this pixelby-pixel representation of the image is also quite uneconomical. It i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
12
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 15 publications
(17 citation statements)
references
References 44 publications
0
12
0
Order By: Relevance
“…The connection between linear autoencoders and principal component analysis (PCA) was first proved by (Baldi and Hornik, 1989), which provides the basis for Theorem 2. The vast applications of autoencoders such as in language (Socher et al, 2011;Silberer and Lapata, 2014), speech (Gosztolya et al, 2019) and vision (Pu et al, 2016) domains suggest that non-linear autoencoders can indeed learn better representations than PCA.…”
Section: A Theoretical Proofsmentioning
confidence: 99%
“…The connection between linear autoencoders and principal component analysis (PCA) was first proved by (Baldi and Hornik, 1989), which provides the basis for Theorem 2. The vast applications of autoencoders such as in language (Socher et al, 2011;Silberer and Lapata, 2014), speech (Gosztolya et al, 2019) and vision (Pu et al, 2016) domains suggest that non-linear autoencoders can indeed learn better representations than PCA.…”
Section: A Theoretical Proofsmentioning
confidence: 99%
“…If the auditory signals cannot be used as input when recognizing the voice, the natural voice should be synthesized through the movement of the mobile organs of the vocal tract (e.g., tongue, lips). The system that performs automatic articulatory-to-acoustic mapping can be a component of these silent speech interfaces (SSIs) [ 6 ], and the studies are underway to solve the problems of not being able to record the voice signal itself [ 76 ] and reconstructing the voice by converting the articulatory movement of the voiceless patient into speech [ 77 ]. Table 2 presents a brief summary of the deep learning models so far.…”
Section: Deep Learning Based Voice Recognitionmentioning
confidence: 99%
“…Therefore, developing an effective algorithm to convert articulatory into speech is the main goal in SSI research [ 97 ], and deep learning technology has been introduced to achieve this goal. As the area of speech technology such as speech recognition and speech synthesis using deep learning has become wider, recent studies are attempting to solve the issue of articulatory-to-acoustic conversion [ 76 ]. In implementing SSI or silent speech recognition (SSR) technologies, such as sensor handling, interference, and feature extraction, using deep learning are also increasing to improve recognition performance [ 7 ].…”
Section: Deep Learning Based Voice Recognitionmentioning
confidence: 99%
See 1 more Smart Citation
“…This model achieved a recognition accuracy of 80.4% when tested over the database developed in [106], which validated it for visual speech recognition. Deep autoencoders were used in [273], [274] to extract features from ultrasound images, achieving significant gains in both silent ASR and direct synthesis. In [275], multitask learning of speech recognition and synthesis parameters was evaluated in the context of an ultrasoundbased SSI system designed to enhance the performance of individual tasks.…”
Section: ) Imaging Techniquesmentioning
confidence: 99%