Background The measurement and monitoring of generalized anxiety disorder requires frequent interaction with psychiatrists or psychologists. Access to mental health professionals is often difficult because of high costs or insufficient availability. The ability to assess generalized anxiety disorder passively and at frequent intervals could be a useful complement to conventional treatment and help with relapse monitoring. Prior work suggests that higher anxiety levels are associated with features of human speech. As such, monitoring speech using personal smartphones or other wearable devices may be a means to achieve passive anxiety monitoring. Objective This study aims to validate the association of previously suggested acoustic and linguistic features of speech with anxiety severity. Methods A large number of participants (n=2000) were recruited and participated in a single web-based study session. Participants completed the Generalized Anxiety Disorder 7-item scale assessment and provided an impromptu speech sample in response to a modified version of the Trier Social Stress Test. Acoustic and linguistic speech features were a priori selected based on the existing speech and anxiety literature, along with related features. Associations between speech features and anxiety levels were assessed using age and personal income as covariates. Results Word count and speaking duration were negatively correlated with anxiety scores (r=–0.12; P<.001), indicating that participants with higher anxiety scores spoke less. Several acoustic features were also significantly (P<.05) associated with anxiety, including the mel-frequency cepstral coefficients, linear prediction cepstral coefficients, shimmer, fundamental frequency, and first formant. In contrast to previous literature, second and third formant, jitter, and zero crossing rate for the z score of the power spectral density acoustic features were not significantly associated with anxiety. Linguistic features, including negative-emotion words, were also associated with anxiety (r=0.10; P<.001). In addition, some linguistic relationships were sex dependent. For example, the count of words related to power was positively associated with anxiety in women (r=0.07; P=.03), whereas it was negatively associated with anxiety in men (r=–0.09; P=.01). Conclusions Both acoustic and linguistic speech measures are associated with anxiety scores. The amount of speech, acoustic quality of speech, and gender-specific linguistic characteristics of speech may be useful as part of a system to screen for anxiety, detect relapse, or monitor treatment.
Background Frequent interaction with mental health professionals is required to screen, diagnose, and track mental health disorders. However, high costs and insufficient access can make frequent interactions difficult. The ability to assess a mental health disorder passively and at frequent intervals could be a useful complement to the conventional treatment. It may be possible to passively assess clinical symptoms with high frequency by characterizing speech alterations collected using personal smartphones or other wearable devices. The association between speech features and mental health disorders can be leveraged as an objective screening tool. Objective This study aimed to evaluate the performance of a model that predicts the presence of generalized anxiety disorder (GAD) from acoustic and linguistic features of impromptu speech on a larger and more generalizable scale than prior studies did. Methods A total of 2000 participants were recruited, and they participated in a single web-based session. They completed the Generalized Anxiety Disorder-7 item scale assessment and provided an impromptu speech sample in response to a modified version of the Trier Social Stress Test. We used the linguistic and acoustic features that were found to be associated with anxiety disorders in previous studies along with demographic information to predict whether participants fell above or below the screening threshold for GAD based on the Generalized Anxiety Disorder-7 item scale threshold of 10. Separate models for each sex were also evaluated. We reported the mean area under the receiver operating characteristic (AUROC) from a repeated 5-fold cross-validation to evaluate the performance of the models. Results A logistic regression model using only acoustic and linguistic speech features achieved a significantly greater prediction accuracy than a random model did (mean AUROC 0.57, SD 0.03; P<.001). When separately assessing samples from female participants, we observed a mean AUROC of 0.55 (SD 0.05; P=.01). The model constructed from the samples from male participants achieved a mean AUROC of 0.57 (SD 0.07; P=.002). The mean AUROC increased to 0.62 (SD 0.03; P<.001) on the all-sample data set when demographic information (age, sex, and income) was included, indicating the importance of demographics when screening for anxiety disorders. The performance also increased for the female sample to a mean of 0.62 (SD 0.04; P<.001) when using demographic information (age and income). An increase in performance was not observed when demographic information was added to the model constructed from the male samples. Conclusions A logistic regression model using acoustic and linguistic speech features, which have been suggested to be associated with anxiety disorders in prior studies, can achieve above-random accuracy for predicting GAD. Importantly, the addition of basic demographic variables further improves model performance, suggesting a role for speech and demographic information to be used as automated, objective screeners of GAD.
Background The ability to automatically detect anxiety disorders from speech could be useful as a screening tool for an anxiety disorder. Prior studies have shown that individual words in textual transcripts of speech have an association with anxiety severity. Transformer-based neural networks are models that have been recently shown to have powerful predictive capabilities based on the context of more than one input word. Transformers detect linguistic patterns and can be separately trained to make specific predictions based on these patterns. Objective This study aimed to determine whether a transformer-based language model can be used to screen for generalized anxiety disorder from impromptu speech transcripts. Methods A total of 2000 participants provided an impromptu speech sample in response to a modified version of the Trier Social Stress Test (TSST). They also completed the Generalized Anxiety Disorder 7-item (GAD-7) scale. A transformer-based neural network model (pretrained on large textual corpora) was fine-tuned on the speech transcripts and the GAD-7 to predict whether a participant was above or below a screening threshold of the GAD-7. We reported the area under the receiver operating characteristic curve (AUROC) on the test data and compared the results with a baseline logistic regression model using the Linguistic Inquiry and Word Count (LIWC) features as input. Using the integrated gradient method to determine specific words that strongly affect the predictions, we inferred specific linguistic patterns that influence the predictions. Results The baseline LIWC-based logistic regression model had an AUROC value of 0.58. The fine-tuned transformer model achieved an AUROC value of 0.64. Specific words that were often implicated in the predictions were also dependent on the context. For example, the first-person singular pronoun “I” influenced toward an anxious prediction 88% of the time and a nonanxious prediction 12% of the time, depending on the context. Silent pauses in speech, also often implicated in predictions, influenced toward an anxious prediction 20% of the time and a nonanxious prediction 80% of the time. Conclusions There is evidence that a transformer-based neural network model has increased predictive power compared with the single word–based LIWC model. We also showed that the use of specific words in a specific context—a linguistic pattern—is part of the reason for the better prediction. This suggests that such transformer-based models could play a useful role in anxiety screening systems.
BACKGROUND The measurement and monitoring of Generalized Anxiety Disorder (GAD) requires frequent interaction with psychiatrists or psychologists. Access to mental health professionals is often difficult due to high costs or insufficient availability. The ability to assess GAD passively and at frequent intervals could be a useful complement to conventional treatment and help with relapse monitoring. Prior work suggests that higher anxiety levels are associated with changes in human speech. As such, monitoring speech using personal smartphones or other wearable devices may be a means to achieve passive anxiety monitoring. OBJECTIVE To validate the association of previously suggested acoustic and linguistic features of speech with anxiety severity. METHODS A large number of participants (N=2,000) were recruited and participated in a single online study session. Participants completed the Generalized Anxiety Disorder-7 item scale (GAD-7) assessment and provided an impromptu speech sample in response to a modified version of the Trier Social Stress Test. Acoustic and linguistic speech features were a-priori selected based on the existing speech and anxiety literature, together with related features. Associations between speech features and anxiety levels were assessed using age and personal income included as covariates. RESULTS Word count and speaking duration were negatively correlated with anxiety scores (r=-0.12; P<.001), indicating that participants with higher anxiety scores spoke less. Several acoustic features were also significantly (P<.05) associated with anxiety including the Mel Frequency Cepstral Coefficients (MFCCs), Linear Prediction Cepstral Coefficients (LPCCs), Shimmer, Fundamental Frequency, and first formant. In contrast to previous literature, the acoustic features, second and third formant, Jitter, and ZCR-zPSD were not significantly associated with anxiety. Linguistic features, including negative emotion words, were also associated with anxiety (r=0.10; P<.001). Additionally, some linguistic relationships were sex-dependent. The number of sentences produced was strongly associated with anxiety in females (r=0.12; P<.001). The use of personal pronouns was strongly associated with anxiety in males (r=0.11; P<.001). CONCLUSIONS Both acoustic and linguistic speech measures are associated with anxiety scores. The amount of speech, acoustic quality of speech, and gender-specific linguistic characteristics of speech may be useful as part of a system to screen for anxiety, detect relapse, or treatment monitoring.
BACKGROUND The ability to automatically detect anxiety disorders from speech could be useful as a screening tool for an anxiety disorder. Prior studies have shown that individual words in textual transcripts of speech have an association with anxiety severity. Transformer-based neural networks are models that have been recently shown to have powerful predictive capabilities, based on multiple input words. Transformers detect linguistic patterns and can be separately trained to make specific predictions based on those patterns. OBJECTIVE To determine if a transformer-based language model can be used to screen for Generalized Anxiety Disorder from impromptu speech transcripts. METHODS A total of N=2,000 participants provided an impromptu speech sample in response to a modified version of the Trier Social Stress Test. They also completed the Generalized Anxiety Disorder-7 item scale (GAD-7). A transformer-based neural-network model (pre-trained on large textual corpora) was fine-tuned on the speech transcripts and the GAD-7 to predict above or below a screening threshold of the GAD-7. We report the area under the receiver operating characteristic (AUROC) on test data and compare the results with a baseline logistic regression model using the Linguistic Inquiry and Word Count (LIWC) features as input. Using the Integrated Gradient method to determine specific words that strongly affect the predictions, we infer specific linguistic patterns that influence the predictions. RESULTS The baseline LIWC-based logistic regression model had an AUROC value of 0.58. The AUROC of the fine-tuned transformer model achieved an AUROC value of 0.64. Specific words that were often implicated in the predictions were also dependent on the context. For example, the first-person singular pronoun “I,” influenced towards an anxious prediction 88% of the time while it influenced towards nonanxious 12% of the time, depending on the context. Silent pauses in speech, also often implicated in predictions, influenced the prediction of anxious 20% of the time and nonanxious 80% of the time. CONCLUSIONS There is evidence that a transformer-based neural network model has increased predictive power compared to the single-word-based LIWC model. We have also shown that specific words in a specific context – a linguistic pattern – is part of the reason for the better prediction. This suggests that such transformer-based models could play a useful role in anxiety screening systems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.