Depression Assessment by Fusing High and Low Level Features from Audio, Video, and Text

Pampouchidou, Anastasia; Simantiraki, Olympia; Fazlollahi, Amir; Pediaditis, Matthew; Manousos, Dimitris; Roniotis, Alexandros; Giannakakis, Giorgos; Mériaudeau, Fabrice; Simos, Panagiotis G.; Marias, Kostas; Yang, Fan; Tsiknakis, Manolis

doi:10.1145/2988257.2988266

Cited by 81 publications

(36 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Speech technology offers promise because speaking is natural, can be used at a distance, requires no special training, and carries information about a speaker's state. A growing line of AI research has shown that depression can be detected from speech signals using natural language processing (NLP), acoustic models, and multimodal models [3], [4], [5], [6], [7], [8], [9], [10]. Common evaluations with shared data sets, features, and tools have recently led to progress, especially in modeling methods [11], [12], [13], [14], [15].…”

Section: Introductionmentioning

confidence: 99%

Optimizing Speech-Input Length for Speaker-Independent Depression Classification

Rutowski¹,

Harati²,

Lü³

et al. 2019

Interspeech 2019

View full text Add to dashboard Cite

Machine learning models for speech-based depression classification offer promise for health care applications. Despite growing work on depression classification, little is understood about how the length of speech-input impacts model performance. We analyze results for speakerindependent depression classification using a corpus of over 1400 hours of speech from a human-machine health screening application. We examine performance as a function of response input length for two NLP systems that differ in overall performance. Results for both systems show that performance depends on natural length, elapsed length, and ordering of the response within a session. Systems share a minimum length threshold, but differ in a response saturation threshold, with the latter higher for the better system. At saturation it is better to pose a new question to the speaker, than to continue the current response. These and additional reported results suggest how applications can be better designed to both elicit and process optimal input lengths for depression classification.

show abstract

Section: Introductionmentioning

confidence: 99%

Optimizing Speech-Input Length for Speaker-Independent Depression Classification

Rutowski¹,

Harati²,

Lü³

et al. 2019

Interspeech 2019

View full text Add to dashboard Cite

show abstract

“…The present work introduced the GMHI, a novel variant of MHI and reported on the first application of LMHI [23] on the AVEC dataset. Another novelty of the proposed work is that categorical assessment of depressive symptomatology was performed using deep learning methods, for the first time on this dataset.…”

Section: Discussionmentioning

confidence: 99%

“…In the DSC of AVEC'14, Pérez Espinoza et al [21] employed MHI, and for the same challenge, Jan et al [22] proposed the 1-D MHH, an extension of MHI, which is computed on the feature vector sequence instead of the intensity image. As part of their DSC-AVEC'16 participation, Pampouchidou et al [23] introduced Landmark Motion History Images (LMHI), which instead of considering intensities from image sequences, considers sequences of facial landmarks.…”

Section: Motion History Imagementioning

confidence: 99%

See 1 more Smart Citation

Quantitative comparison of motion history image variants for video-based depression assessment

Pampouchidou

Pediaditis

Maridaki

et al. 2017

J Image Video Proc.

Self Cite

View full text Add to dashboard Cite

Depression is the most prevalent mood disorder and a leading cause of disability worldwide. Automated video-based analyses may afford objective measures to support clinical judgments. In the present paper, categorical depression assessment is addressed by proposing a novel variant of the Motion History Image (MHI) which considers Gabor-inhibited filtered data instead of the original image. Classification results obtained with this method on the AVEC'14 dataset are compared to those derived using (a) an earlier MHI variant, the Landmark Motion History Image (LMHI), and (b) the original MHI. The different motion representations were tested in several combinations of appearance-based descriptors, as well as with the use of convolutional neural networks. The F1 score of 87.4% achieved in the proposed work outperformed previously reported approaches.

show abstract

“…Further, we add the presence of each topic to the feature vector, because each interview only covers a few topics and the topic presence might be correlated to the subject's status. Finally, gender is also a ached to the feature vector similar to the work in [22] and [14], where the authors report that gender information can greatly improve the classification performance. Figure 3 illustrates the structure of the feature vector and Table 2 shows the dimension of each feature category in the feature vector.…”

Section: Topic-wise Feature Mappingmentioning

confidence: 96%

Topic Modeling Based Multi-modal Depression Detection

Gong

Poellabauer

2017

Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge

105

View full text Add to dashboard Cite

Major depressive disorder is a common mental disorder that affects almost 7% of the adult U.S. population. e 2017 Audio/Visual Emotion Challenge (AVEC) asks participants to build a model to predict depression levels based on the audio, video, and text of an interview ranging between 7-33 minutes. Since averaging features over the entire interview will lose most temporal information, how to discover, capture, and preserve useful temporal details for such a long interview are significant challenges. erefore, we propose a novel topic modeling based approach to perform context-aware analysis of the recording. Our experiments show that the proposed approach outperforms context-unaware methods and the challenge baselines for all metrics.

show abstract

Depression Assessment by Fusing High and Low Level Features from Audio, Video, and Text

Cited by 81 publications

References 14 publications

Optimizing Speech-Input Length for Speaker-Independent Depression Classification

Optimizing Speech-Input Length for Speaker-Independent Depression Classification

Quantitative comparison of motion history image variants for video-based depression assessment

Topic Modeling Based Multi-modal Depression Detection

Contact Info

Product

Resources

About