Automatic depression assessment based on visual cues is a rapidly growing research domain. The present exhaustive review of existing approaches as reported in over sixty publications during the last ten years focuses on image processing and machine learning algorithms. Visual manifestations of depression, various procedures used for data collection, and existing datasets are summarized. The review outlines methods and algorithms for visual feature extraction, dimensionality reduction, decision methods for classification and regression approaches, as well as different fusion strategies. A quantitative meta-analysis of reported results, relying on performance metrics robust to chance, is included, identifying general trends and key unresolved issues to be considered in future studies of automatic depression assessment utilizing visual cues alone or in combination with vocal or verbal cues.
International audienceDepression is a major cause of disability world-wide. The present paper reports on the results of our participation to the depression sub-challenge of the sixth Audio/Visual Emotion Challenge (AVEC 2016), which was designed to compare feature modalities ( audio, visual, interview transcript-based) in gender-based and gender-independent modes using a variety of classification algorithms. In our approach, both high and low level features were assessed in each modality. Audio features were extracted from the low-level descriptors provided by the challenge organizers. Several visual features were extracted and assessed including dynamic characteristics of facial elements (using Landmark Motion History Histograms and Landmark Motion Magnitude), global head motion, and eye blinks. These features were combined with statistically derived features from pre-extracted features ( emotions, action units, gaze, and pose). Both speech rate and word-level semantic content were also evaluated. Classification results are reported using four different classification schemes: i) gender-based models for each individual modality, ii) the feature fusion model, ii) the decision fusion model, and iv) the posterior probability classification model. Proposed approaches outperforming the reference classification accuracy include the one utilizing statistical descriptors of low-level audio features. This approach achieved f1-scores of 0.59 for identifying depressed and 0.87 for identifying notdepressed individuals on the development set and 0.52/0.81, respectively for the test set
Depression is one of the most prevalent mental disorders, burdening many people world-wide. A system with the potential of serving as a decision support system is proposed, based on novel features extracted from facial expression geometry and speech, by interpreting non-verbal manifestations of depression. The proposed system has been tested both in gender independent and gender based modes, and with different fusion methods. The algorithms were evaluated for several combinations of parameters and classification schemes, on the dataset provided by the Audio/Visual Emotion Challenge of 2013 and 2014. The proposed framework achieved a precision of 94.8% for detecting persons achieving high scores on a self-report scale of depressive symptomatology. Optimal system performance was obtained using a nearest neighbour classifier on the decision fusion of geometrical features in the gender independent mode, and audio based features in the gender based mode; single visual and audio decisions were combined with the OR binary operation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.