BACKGROUND
Changes in various speech features have been linked to various neurological and mental health-related pathologies; these changes can often be detected years before a definitive clinical diagnosis. With the growing interest in using speech analysis for detecting a myriad of health conditions and the growing number of patients with multiple health problems, it will be increasingly important to demonstrate that speech analysis can differentiate between these conditions, to provide reliable and accurate diagnosis and assessment.
OBJECTIVE
Toward this end, this study takes a first step in this direction by examining the speech biosignatures of two common neurological conditions: (1) mild traumatic brain injuries (such as concussions) and (2) neurodegenerative diseases, particularly focusing on amyotrophic lateral sclerosis (ALS) and Parkinson's Disease (PD).
METHODS
Further, this study utilizes two specific types of speech tests well-known to and frequently used by speech-language pathologists: a diadochokinetic test (“PaTaKa test”) and a sustained vowel test. This work investigates data from over 230 participants and examines 25 temporal and 12 spectral features of sound. We trained classification models using four popular machine learning algorithms including Support Vector Machine, Decision Tree, Random Forest, and Extreme Gradient Boosting (XGBoost) with three speech feature sets selected considering statistical significance based on the analysis of variance (ANOVA).
RESULTS
The results show that out of these 37 features, over 20 show statistical significance in differentiating between concussions, neurodegenerative diseases, and healthy controls. The paper further compares the performance of different classification models using these features. For the PaTaKa test, when the entire speech feature set was used, Decision Tree obtained the highest F1 score of 0.98 among the four classifiers. When the most significant speech feature set that has ANOVA p-values less than 0.05 was used, the highest F1-score (0.95) was obtained using XGBoost. A maximum F1-score of 0.93 was obtained with both Random Forest and XGBoost when the top statistically significant 5 features were used in model training. None of the classifiers could achieve an F1-score over 0.7 with speech features obtained through sustained vowel test.
CONCLUSIONS
This study demonstrated that the speech signatures of individuals with concussions, individuals with neurodegenerative conditions, and healthy controls differ in a way that allows us to identify unique patterns in speech associated with each condition, providing opportunities for the development of speech-based biomarkers for early detection and more accurate diagnosis. The findings of this study also indicate that it is feasible to use speech analysis to detect multiple different health conditions within the same person.