Experimental Evaluation of Deep Learning Methods for an Intelligent Pathological Voice Detection System Using the Saarbruecken Voice Database

Lee, Ji-Yeoun

doi:10.3390/app11157149

Cited by 25 publications

(12 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…This research was also compared to other studies that used the same workflow ( Table 7 ). The key benefit of this study is that it produced better accuracy using the vowel /a/ than prior studies that employed all of the vowels to train the model [ 57 ]. With the LPCCs parameter in the /u/ vowel in men, the CNN classifier attained the best accuracy, 82.69%.…”

Section: Resultsmentioning

confidence: 99%

Neurogenerative Disease Diagnosis in Cepstral Domain Using MFCC with Deep Learning

Alghamdi

Zakariah

Hoang

et al. 2022

Computational and Mathematical Methods in Medicine

View full text Add to dashboard Cite

Because underlying cognitive and neuromuscular activities regulate speech signals, biomarkers in the human voice can provide insight into neurological illnesses. Multiple motor and nonmotor aspects of neurologic voice disorders arise from an underlying neurologic condition such as Parkinson’s disease, multiple sclerosis, myasthenia gravis, or ALS. Voice problems can be caused by disorders that affect the corticospinal system, cerebellum, basal ganglia, and upper or lower motoneurons. According to a new study, voice pathology detection technologies can successfully aid in the assessment of voice irregularities and enable the early diagnosis of voice pathology. In this paper, we offer two deep-learning-based computational models, 1-dimensional convolutional neural network (1D CNN) and 2-dimensional convolutional neural network (2D CNN), that simultaneously detect voice pathologies caused by neurological illnesses or other causes. From the German corpus Saarbruecken Voice Database (SVD), we used voice recordings of sustained vowel /a/ generated at normal pitch. The collected voice signals are padded and segmented to maintain homogeneity and increase the number of samples. Convolutional layers are applied to raw data, and MFCC features are extracted in this project. Although the 1D CNN had the maximum accuracy of 93.11% on test data, model training produced overfitting and 2D CNN, which generalized the data better and had lower train and validation loss despite having an accuracy of 84.17% on test data. Also, 2D CNN outperforms state-of-the-art studies in the field, implying that a model trained on handcrafted features is better for speech processing than a model that extracts features directly.

show abstract

Section: Resultsmentioning

confidence: 99%

Neurogenerative Disease Diagnosis in Cepstral Domain Using MFCC with Deep Learning

Alghamdi

Zakariah

Hoang

et al. 2022

Computational and Mathematical Methods in Medicine

View full text Add to dashboard Cite

show abstract

“…Voice data used in this project was accessed from the Saarbruecken voice database (SVD) hosted by Saarland University 21 . This database contains labeled healthy and pathological audio samples consisting of both sustained vowels (/a/, /i/, and /u/) and sentences 22,23 . The vowel samples were recorded in three different pitches: low, neutral, and high.…”

Section: Methodsmentioning

confidence: 99%

End‐to‐end deep learning classification of vocal pathology using stacked vowels

Liu,

Hodges,

et al. 2023

Laryngoscope Investig Oto

View full text Add to dashboard Cite

ObjectivesAdvances in artificial intelligence (AI) technology have increased the feasibility of classifying voice disorders using voice recordings as a screening tool. This work develops upon previous models that take in single vowel recordings by analyzing multiple vowel recordings simultaneously to enhance prediction of vocal pathology.MethodsVoice samples from the Saarbruecken Voice Database, including three sustained vowels (/a/, /i/, /u/) from 687 healthy human participants and 334 dysphonic patients, were used to train 1‐dimensional convolutional neural network models for multiclass classification of healthy, hyperfunctional dysphonia, and laryngitis voice recordings. Three models were trained: (1) a baseline model that analyzed individual vowels in isolation, (2) a stacked vowel model that analyzed three vowels (/a/, /i/, /u/) in the neutral pitch simultaneously, and (3) a stacked pitch model that analyzed the /a/ vowel in three pitches (low, neutral, and high) simultaneously.ResultsFor multiclass classification of healthy, hyperfunctional dysphonia, and laryngitis voice recordings, the stacked vowel model demonstrated higher performance compared with the baseline and stacked pitch models (F1 score 0.81 vs. 0.77 and 0.78, respectively). Specifically, the stacked vowel model achieved higher performance for class‐specific classification of hyperfunctional dysphonia voice samples compared with the baseline and stacked pitch models (F1 score 0.56 vs. 0.49 and 0.50, respectively).ConclusionsThis study demonstrates the feasibility and potential of analyzing multiple sustained vowel recordings simultaneously to improve AI‐driven screening and classification of vocal pathology. The stacked vowel model architecture in particular offers promise to enhance such an approach.Lay SummaryAI analysis of multiple vowel recordings can improve classification of voice pathologies compared with models using a single sustained vowel and offer a strategy to enhance AI‐driven screening of voice disorders.Level of Evidence3

show abstract

“…AI and related deep learning technologies have been shown to accelerate the time course and improve the quality of disease diagnosis and treatment monitoring. [13,14] Recently, advanced AI methods have been adopted for classifying airway-related symptoms such as cough [15,16] and deviated voice quality [17] in various clinical populations. Lean models have been proposed for the detection of cough in patients suffering from chronic cough, COPD, asthma, and lung cancer.…”

Section: Introductionmentioning

confidence: 99%

Efficient and Explainable Deep Neural Networks for Airway Symptom Detection in Support of Wearable Health Technology

Groh

Lei

Martignetti

et al. 2022

Advanced Intelligent Systems

View full text Add to dashboard Cite

Mobile health wearables are often embedded with small processors for signal acquisition and analysis. These embedded wearable systems are, however, limited with low available memory and computational power. Advances in machine learning, especially deep neural networks (DNNs), have been adopted for efficient and intelligent applications to overcome constrained computational environments. Herein, evolutionary algorithms are used to find novel DNNs that are accurate in classifying airway symptoms while allowing wearable deployment. As opposed to typical microphone‐acoustic signals, mechano‐acoustic data signals, which did not contain identifiable speech information for better privacy protection, are acquired from laboratory‐generated and publicly available datasets. The optimized DNNs had a low model file size of less than 150 kB and predicted airway symptoms of interest with 81.49% accuracy on unseen data. By performing explainable AI techniques, namely occlusion experiments and class activation maps, mel‐frequency bands up to 8,000 Hz are found as the most important feature for the classification. It is further found that DNN decisions are consistently relying on these specific features, fostering trust and transparency of the proposed DNNs. The proposed efficient and explainable DNN is expected to support edge computing on mechano‐acoustic sensing wearables for remote, long‐term monitoring of airway symptoms.

show abstract

Experimental Evaluation of Deep Learning Methods for an Intelligent Pathological Voice Detection System Using the Saarbruecken Voice Database

Cited by 25 publications

References 33 publications

Neurogenerative Disease Diagnosis in Cepstral Domain Using MFCC with Deep Learning

Neurogenerative Disease Diagnosis in Cepstral Domain Using MFCC with Deep Learning

End‐to‐end deep learning classification of vocal pathology using stacked vowels

Efficient and Explainable Deep Neural Networks for Airway Symptom Detection in Support of Wearable Health Technology

Contact Info

Product

Resources

About