Interspeech 2019 2019
DOI: 10.21437/interspeech.2019-1206
|View full text |Cite
|
Sign up to set email alerts
|

Interpretable Deep Learning Model for the Detection and Reconstruction of Dysarthric Speech

Abstract: We present a novel deep learning model for the detection and reconstruction of dysarthric speech. We train the model with a multi-task learning technique to jointly solve dysarthria detection and speech reconstruction tasks. The model key feature is a low-dimensional latent space that is meant to encode the properties of dysarthric speech. It is commonly believed that neural networks are black boxes that solve problems but do not provide interpretable outputs. On the contrary, we show that this latent space su… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
17
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 21 publications
(17 citation statements)
references
References 19 publications
0
17
0
Order By: Relevance
“…[4,5,6]. Recent works have leveraged cycle-consistent adversarial training [7], auxiliary decoders for detection of dysarthria and reconstruction [8], generative adversarial networks for unpaired voice conversion [9], crossmodal knowledge distillation for dysarthric speech reconstruction [10], and spectral conversion using multiple linear regression-based frequency warping predictions [11]. ASR for Atypical Speech There is growing interest in personalizing ASR systems for atypical speech patterns.…”
Section: Related Workmentioning
confidence: 99%
“…[4,5,6]. Recent works have leveraged cycle-consistent adversarial training [7], auxiliary decoders for detection of dysarthria and reconstruction [8], generative adversarial networks for unpaired voice conversion [9], crossmodal knowledge distillation for dysarthric speech reconstruction [10], and spectral conversion using multiple linear regression-based frequency warping predictions [11]. ASR for Atypical Speech There is growing interest in personalizing ASR systems for atypical speech patterns.…”
Section: Related Workmentioning
confidence: 99%
“…GRU is a simplified architecture with an efficiency degree that is comparable to LSTM. These two approaches have been adopted for building automatic speech assessment systems [10,[16][17][18][19], e.g., the work done by Korzekwa et al on dysarthric speech [16].…”
Section: Automatic Assessment Approachesmentioning
confidence: 99%
“…Mel-frequency cepstral coefficients (MFCCs) are commonly used in speech assessment systems for acoustic modeling [10,[17][18][19]24] and feature extraction [25,26]. While deep learning models recently attract intense attentions, Mel Spectrogram is also getting increasingly popular [10,12,16]. For our experiments, we implemented MFCCs and Mel Spectrogram feature extraction in Python using the librosa library [27].…”
Section: Speech Representationmentioning
confidence: 99%
“…Recently, benefiting from the powerful representational capacity of deep neural networks in acoustic and language modeling, sequence-to-sequence (seq2seq) models have progressively become dominant on DSR tasks [1,4,3,5]. They typically discard the frame-independence assumption in Hidden Markov model (HMM)-based models [6] but implicitly learn language models directly by optimizing word error rate (WER) [7].…”
Section: Introductionmentioning
confidence: 99%