Background
Frequent interaction with mental health professionals is required to screen, diagnose, and track mental health disorders. However, high costs and insufficient access can make frequent interactions difficult. The ability to assess a mental health disorder passively and at frequent intervals could be a useful complement to the conventional treatment. It may be possible to passively assess clinical symptoms with high frequency by characterizing speech alterations collected using personal smartphones or other wearable devices. The association between speech features and mental health disorders can be leveraged as an objective screening tool.
Objective
This study aimed to evaluate the performance of a model that predicts the presence of generalized anxiety disorder (GAD) from acoustic and linguistic features of impromptu speech on a larger and more generalizable scale than prior studies did.
Methods
A total of 2000 participants were recruited, and they participated in a single web-based session. They completed the Generalized Anxiety Disorder-7 item scale assessment and provided an impromptu speech sample in response to a modified version of the Trier Social Stress Test. We used the linguistic and acoustic features that were found to be associated with anxiety disorders in previous studies along with demographic information to predict whether participants fell above or below the screening threshold for GAD based on the Generalized Anxiety Disorder-7 item scale threshold of 10. Separate models for each sex were also evaluated. We reported the mean area under the receiver operating characteristic (AUROC) from a repeated 5-fold cross-validation to evaluate the performance of the models.
Results
A logistic regression model using only acoustic and linguistic speech features achieved a significantly greater prediction accuracy than a random model did (mean AUROC 0.57, SD 0.03; P<.001). When separately assessing samples from female participants, we observed a mean AUROC of 0.55 (SD 0.05; P=.01). The model constructed from the samples from male participants achieved a mean AUROC of 0.57 (SD 0.07; P=.002). The mean AUROC increased to 0.62 (SD 0.03; P<.001) on the all-sample data set when demographic information (age, sex, and income) was included, indicating the importance of demographics when screening for anxiety disorders. The performance also increased for the female sample to a mean of 0.62 (SD 0.04; P<.001) when using demographic information (age and income). An increase in performance was not observed when demographic information was added to the model constructed from the male samples.
Conclusions
A logistic regression model using acoustic and linguistic speech features, which have been suggested to be associated with anxiety disorders in prior studies, can achieve above-random accuracy for predicting GAD. Importantly, the addition of basic demographic variables further improves model performance, suggesting a role for speech and demographic information to be used as automated, objective screeners of GAD.