Interspeech 2020 2020
DOI: 10.21437/interspeech.2020-2684
|View full text |Cite
|
Sign up to set email alerts
|

Acoustic Feature Extraction with Interpretable Deep Neural Network for Neurodegenerative Related Disorder Classification

Abstract: Speech-based automatic approaches for detecting neurodegenerative disorders (ND) and mild cognitive impairment (MCI) have received more attention recently due to being non-invasive and potentially more sensitive than current pen-and-paper tests. The performance of such systems is highly dependent on the choice of features in the classification pipeline. In particular for acoustic features, arriving at a consensus for a best feature set has proven challenging. This paper explores using deep neural network for e… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
11
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
3

Relationship

1
5

Authors

Journals

citations
Cited by 12 publications
(11 citation statements)
references
References 21 publications
0
11
0
Order By: Relevance
“…For example, it has been shown that the first layers of end-to-end convolutional neural networks that learn representations from raw audio data extract features that are similar to the spectrogram or energies in mel-frequency bands [53][54][55]. Additionally, some works have addressed the design of the first layers of these networks to tailor the feature extraction stage using parametric filters [56][57][58] or trainable hand-crafted kernels [59,60]. Attention mechanisms have been used to bring interpretability to neural networks in speech and music emotion recognition [61,62] and in music auto-tagging [63].…”
Section: Relation With Previous Workmentioning
confidence: 99%
“…For example, it has been shown that the first layers of end-to-end convolutional neural networks that learn representations from raw audio data extract features that are similar to the spectrogram or energies in mel-frequency bands [53][54][55]. Additionally, some works have addressed the design of the first layers of these networks to tailor the feature extraction stage using parametric filters [56][57][58] or trainable hand-crafted kernels [59,60]. Attention mechanisms have been used to bring interpretability to neural networks in speech and music emotion recognition [61,62] and in music auto-tagging [63].…”
Section: Relation With Previous Workmentioning
confidence: 99%
“…Previous studies [16] have shown that Sinc-CLA architecture has a good performance and interpretability in classifying recordings from people living with mild cognitive impairment, neurodegenerative disorders, or healthy controls. The multi-task Sinc-CLA system introduced in this paper is shown in Figure 3.…”
Section: End-to-end Systemmentioning
confidence: 99%
“…The SincNet Layer and CNN layers are shared by the two tasks, but the bi-directional LSTM and its following layers are separately trained with a specific target (age or MMSE). The detailed description of each functional layers can be found in Section 3.4 of this paper and in [16].…”
Section: End-to-end Systemmentioning
confidence: 99%
See 2 more Smart Citations