2017
DOI: 10.1109/taslp.2017.2757263
|View full text |Cite
|
Sign up to set email alerts
|

Direct Speech Reconstruction From Articulatory Sensor Data by Machine Learning

Abstract: This paper describes a technique which generates speech acoustics from articulator movements. Our motivation is to help people who can no longer speak following laryngectomy, a procedure which is carried out tens of thousands of times per year in the Western world. Our method for sensing articulator movement, Permanent Magnetic Articulography, relies on small, unobtrusive magnets attached to the lips and tongue. Changes in magnetic field caused by magnet movements are sensed and form the input to a process whi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
62
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
3
3

Relationship

0
10

Authors

Journals

citations
Cited by 67 publications
(62 citation statements)
references
References 42 publications
0
62
0
Order By: Relevance
“…There are two distinct ways of SSI solutions, namely 'direct synthesis' and 'recognition-and-synthesis' [21]. In the first case, the speech signal is generated without an intermediate step, directly from the articulatory data, typically using vocoders [4,5,6,8,9,10,11,15,17]. In the second case, silent speech recognition (SSR) is applied on the biosignal which extracts the content spoken by the person (i.e.…”
Section: Introductionmentioning
confidence: 99%
“…There are two distinct ways of SSI solutions, namely 'direct synthesis' and 'recognition-and-synthesis' [21]. In the first case, the speech signal is generated without an intermediate step, directly from the articulatory data, typically using vocoders [4,5,6,8,9,10,11,15,17]. In the second case, silent speech recognition (SSR) is applied on the biosignal which extracts the content spoken by the person (i.e.…”
Section: Introductionmentioning
confidence: 99%
“…In another paper a multimodal Deep AutoEncoder was used to synthesize sung vowels based on ultrasound recordings and a video of the lips [27]. Gonzalez and his colleagues compared GMM, DNN and RNN [36] for PMA-based direct synthesis, Csapó et al used DNNs to predict the spectral parameters [6] and F0 [28] of a vocoder using UTI as articulatory input. For the prediction of the V/U flag and F0 using articulatory input, multiple DNN architectures were compared, including DNN, RNN and LSTM neural networks [46], [47].…”
Section: Deep Neural Network In the Inversion And Mapping Fieldsmentioning
confidence: 99%
“…The use of learning models is very much relevant and learning methods like SVM is widely used due their function characteristic of non-linearity to map the data to a very high dimension but at the same time it is not effective to handle many properties of dataset of speech emotion. The authors (C. Zha, P. Yang, X. Zhang and L. Zhao,2016) handles this issue of uni-functionality by introducing multiple kernels and tested on the Aibo dataset which exhibits better result as compared to the SVM [9].…”
Section: Related Workmentioning
confidence: 99%