Dysarthria Speech Detection Using Convolutional Neural Networks with Gated Recurrent Unit

Shih, Dong‐Her; Liao, Ching-Hsien; Wu, Ting-Wei; Xu, Xiaoyin; Shih, Ming‐Hung

doi:10.3390/healthcare10101956

Cited by 17 publications

(4 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Furthermore, Kashyap, et al showed that models trained on the "ta-ta-ta" and "British Constitution" tasks were able to capture disease progression over a two-year period [37]. Though some works reported high accuracy using CNN-based and gated recurrent unit-based deep learning models applied to mel spectrograms [48], [49], such results are likely optimistic as they were achieved using methodologies that did not ensure participant independence between training and testing sets [50]. A recent work by Song, et al, however, provides a comparable benchmark for the proposed models' performance [38].…”

Section: Discussionmentioning

confidence: 99%

Sensitive Quantification of Cerebellar Speech Abnormalities Using Deep Learning Models

Vattis,

Oubre,

Luddy

et al. 2024

IEEE Access

View full text Add to dashboard Cite

Objective: Objective, sensitive, and meaningful disease assessments are critical to support clinical trials and clinical care. Speech changes are one of the earliest and most evident manifestations of cerebellar ataxias. This work aims to develop models that can accurately identify and quantify clinical signs of ataxic speech. Methods: We use convolutional neural networks to capture the motor speech phenotype of cerebellar ataxia based on time and frequency partial derivatives of log-mel spectrogram representations of speech. We train classification models to distinguish patients with ataxia from healthy controls as well as regression models to estimate disease severity. Results: Classification models were able to accurately distinguish healthy controls from individuals with ataxia, including ataxia participants who clinicians rated as having no detectable clinical deficits in speech. Regression models produced accurate estimates of disease severity, were able to measure subclinical signs of ataxia, and captured disease progression over time. Conclusion: Convolutional networks trained on time and frequency partial derivatives of the speech signal can detect sub-clinical speech changes in ataxias and sensitively measure disease change over time. Significance: Learned speech analysis models have the potential to aid early detection of disease signs in ataxias and provide sensitive, low-burden assessment tools in support of clinical trials and neurological care.

show abstract

Section: Discussionmentioning

confidence: 99%

Sensitive Quantification of Cerebellar Speech Abnormalities Using Deep Learning Models

Vattis,

Oubre,

Luddy

et al. 2024

IEEE Access

View full text Add to dashboard Cite

show abstract

“…Other models, including ADD and ADSLA, are classic machine learning methods, like RF and SVM, that are employed for classification problems [22]. When processing voice data, CNNs and GRUs each handle different tasks: CNNs record local patterns, GRUs represent sequential dependencies [39], and cascade convolution models frequently combine multiple convolutional layers.…”

Section: Comparative Analysismentioning

confidence: 99%

Automatic dysarthria detection and severity level assessment using CWT-layered CNN model

Sajiha,

Radha,

Venkata Rao

et al. 2024

J AUDIO SPEECH MUSIC PROC.

View full text Add to dashboard Cite

Dysarthria is a speech disorder that affects the ability to communicate due to articulation difficulties. This research proposes a novel method for automatic dysarthria detection (ADD) and automatic dysarthria severity level assessment (ADSLA) by using a variable continuous wavelet transform (CWT) layered convolutional neural network (CNN) model. To determine their efficiency, the proposed model is assessed using two distinct corpora, TORGO and UA-Speech, comprising both dysarthria patients and healthy subject speech signals. The research study explores the effectiveness of CWT-layered CNN models that employ different wavelets such as Amor, Morse, and Bump. The study aims to analyze the models’ performance without the need for feature extraction, which could provide deeper insights into the effectiveness of the models in processing complex data. Also, raw waveform modeling preserves the original signal’s integrity and nuance, making it ideal for applications like speech recognition, signal processing, and image processing. Extensive analysis and experimentation have revealed that the Amor wavelet surpasses the Morse and Bump wavelets in accurately representing signal characteristics. The Amor wavelet outperforms the others in terms of signal reconstruction fidelity, noise suppression capabilities, and feature extraction accuracy. The proposed CWT-layered CNN model emphasizes the importance of selecting the appropriate wavelet for signal-processing tasks. The Amor wavelet is a reliable and precise choice for applications. The UA-Speech dataset is crucial for more accurate dysarthria classification. Advanced deep learning techniques can simplify early intervention measures and expedite the diagnosis process.

show abstract

“…Since the recursive process in Step 1 is repeated, the amount of calculation is large. A gated-graph sequential neural network (GGS-NN) replaces the recursion process in Step 1 with a Gated Recurrent Unit (GRU), which is the gating mechanism in a recurrent neural network (RNN) and which has better performance on certain smaller datasets and removes the constraints of contraction mapping [86][87][88][89][90][91][92]. The GRU concept can be expressed using the following formula:…”

Section: Current Qsarmentioning

confidence: 99%

Computational Models That Use a Quantitative Structure–Activity Relationship Approach Based on Deep Learning

Matsuzaka

Uesawa

2023

Processes

View full text Add to dashboard Cite

In the toxicological testing of new small-molecule compounds, it is desirable to establish in silico test methods to predict toxicity instead of relying on animal testing. Since quantitative structure–activity relationships (QSARs) can predict the biological activity from structural information for small-molecule compounds, QSAR applications for in silico toxicity prediction have been studied for a long time. However, in recent years, the remarkable predictive performance of deep learning has attracted attention for practical applications. In this review, we summarize the application of deep learning to QSAR for constructing prediction models, including a discussion of parameter optimization for deep learning.

show abstract

Dysarthria Speech Detection Using Convolutional Neural Networks with Gated Recurrent Unit

Cited by 17 publications

References 21 publications

Sensitive Quantification of Cerebellar Speech Abnormalities Using Deep Learning Models

Sensitive Quantification of Cerebellar Speech Abnormalities Using Deep Learning Models

Automatic dysarthria detection and severity level assessment using CWT-layered CNN model

Computational Models That Use a Quantitative Structure–Activity Relationship Approach Based on Deep Learning

Contact Info

Product

Resources

About