The properties of acoustic speech have previously been investigated as possible cues for depression in adults. However, these studies were restricted to small populations of patients and the speech recordings were made during patients’ clinical interviews or fixed-text reading sessions. Symptoms of depression often first appear during adolescence at a time when the voice is changing, in both males and females, suggesting that specific studies of these phenomena in adolescent populations are warranted. This study investigated acoustic correlates of depression in a large sample of 139 adolescents (68 clinically depressed and 71 controls). Speech recordings were made during naturalistic interactions between adolescents and their parents. Prosodic, cepstral, spectral, and glottal features, as well as features derived from the Teager energy operator (TEO), were tested within a binary classification framework. Strong gender differences in classification accuracy were observed. The TEO-based features clearly outperformed all other features and feature combinations, providing classification accuracy ranging between 81%–87% for males and 72%–79% for females. Close, but slightly less accurate, results were obtained by combining glottal features with prosodic and spectral features (67%–69% for males and 70%–75% for females). These findings indicate the importance of nonlinear mechanisms associated with the glottal flow formation as cues for clinical depression.
In this paper, we report the influence that classification accuracies have in speech analysis from a clinical dataset by adding acoustic low-level descriptors (LLD) belonging to prosodic (i.e. pitch, formants, energy, jitter, shimmer) and spectral features (i.e. spectral flux, centroid, entropy and roll-off) along with their delta ( ) and delta-delta ( -) coefficients to two baseline features of Mel frequency cepstral coefficients and Teager energy criticalband based autocorrelation envelope. Extracted acoustic low-level descriptors (LLD) that display an increase in accuracy after being added to these baseline features were finally modeled together using Gaussian mixture models and tested. A clinical data set of speech from 139 adolescents, including 68 (49 girls and 19 boys) diagnosed as clinically depressed, was used in the classification experiments. For male subjects, the combination of (TEO-CBAuto-Env + + -) + F0 + (LogE + + -) + (Shimmer + ) + Spectral Flux + Spectral Roll-off gave the highest classification rate of 77.82% while for the female subjects, using TEO-CB-AutoEnv gave an accuracy of 74.74%.
We proposed a framework to detect the video contents of depressed and non-depressed subjects. First we characterized the expressed emotions in the video stream using Gabor wavelet features extracted at the facial landmarks which were detected using landmark model matching algorithm. Depressed and non-depressed class models were constructed using Gaussian Mixture models. Using 8 hours of video recordings, an hour of video recording per subject, and both gender and class balanced, we examined the effectiveness of both gender based and gender independent modeling approaches for depressed and non-depressed content classification. We found that the gender based content modeling approach improved the classification accuracy by 6% compared to the gender independent modeling approach, achieving 78.6% average accuracy.
With suicidal behavior being linked to depression that starts at an early age of a person's life, many investigators are trying to find early tell-tale signs to assist psychologists in detecting clinical depression through acoustic analysis of a patient's speech. The purpose ofthis paper was to study the effectiveness ofMel frequency cepstral coefficients (MFCCs) in capturing the overall mental state of a patient through the analysis of their various vocal emotions displayed during 20 minutes of problem-solving interaction sessions. We also propose both gender based and gender independent clinical depression models using Gaussian Mixture models. Experiments on 139 adolescents subject corpus indicates that incorporation of both first and second time derivatives of MFCCs can improve the overall classification accuracy by 3%.Gender differences proved to be a factor in improving clinical depressed subject detection, where gender based models outperformed the gender independent models by 8%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.