Proceedings of the Fifth Workshop on Computational Linguistics And Clinical Psychology: From Keyboard to Clinic 2018
DOI: 10.18653/v1/w18-0602
|View full text |Cite
|
Sign up to set email alerts
|

A Linguistically-Informed Fusion Approach for Multimodal Depression Detection

Abstract: Automated depression detection is inherently a multimodal problem. Therefore, it is critical that researchers investigate fusion techniques for multimodal design. This paper presents the first ever comprehensive study of fusion techniques for depression detection. In addition, we present novel linguistically-motivated fusion techniques, which we find outperform existing approaches.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
17
1

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 24 publications
(23 citation statements)
references
References 21 publications
(35 reference statements)
2
17
1
Order By: Relevance
“…With increasing loneliness scores, speech responses tended to have less inflections and longer pauses in prosodic features; reduced second formant frequencies and variances of the speech spectrum (ΔMFCCs) in acoustic features; and fewer positive words and more filler words in linguistic features. All these trends in their changes were consistent with those observed in individuals with changes in emotional states and mental health conditions, especially those reported in previous studies on depressed speech [for F2 ( 35 , 36 , 40 ); for the variance of ΔMFCCs ( 49 , 50 ); for pitch variation ( 41 , 50 ); for pauses ( 34 , 38 ); for positive words ( 33 , 37 ); for filler words ( 55 57 )]. This result may be reasonable because loneliness and depression are different constructs but closely correlated with each other ( 11 ).…”
Section: Discussionsupporting
confidence: 90%
See 2 more Smart Citations
“…With increasing loneliness scores, speech responses tended to have less inflections and longer pauses in prosodic features; reduced second formant frequencies and variances of the speech spectrum (ΔMFCCs) in acoustic features; and fewer positive words and more filler words in linguistic features. All these trends in their changes were consistent with those observed in individuals with changes in emotional states and mental health conditions, especially those reported in previous studies on depressed speech [for F2 ( 35 , 36 , 40 ); for the variance of ΔMFCCs ( 49 , 50 ); for pitch variation ( 41 , 50 ); for pauses ( 34 , 38 ); for positive words ( 33 , 37 ); for filler words ( 55 57 )]. This result may be reasonable because loneliness and depression are different constructs but closely correlated with each other ( 11 ).…”
Section: Discussionsupporting
confidence: 90%
“…For example, several studies reported that depressed individuals tended to use more negative words and fewer positive words than non-depressed individuals ( 33 , 37 ). Filler words are commonly found in spontaneous speech and have been suggested as important signatures for detecting depression ( 55 57 ). We thus used the number of positive and negative words and the proportion of filler words as linguistic features.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Computing Fixed-Length Feature Vectors for Each Video Features extracted from the facial aect, visual, and vocal modalities in each video are frame-by-frame and depend on the length of videos. In order to prepare these features for use in binary deception classication models, we represented each feature as a xed-length vector of statistical attributes that captured the feature's temporal behavior and distribution during the variable-length videos, similar to [43,51]. For each of the facial aect, visual, and vocal features, we used the TsFresh toolkit [11] to compute 130 timeseries attributes, including the following statistical measures: mean, median, standard deviation, skew, kurtosis, maximum, minimum, sum of values, linear trends, autocorrelation with dierent lags, and the changes among values within dierent quantile ranges.…”
Section: 25mentioning
confidence: 99%
“…Subsequently, there have been efforts (Narayanan and Georgiou, 2013) to automate this behavior annotation (or coding) process using machine learning so that rapid and inexpensive feedback can be provided to the stakeholders. Previous work has shown that automated coding systems are effective at quantifying behaviors from speech and spoken language such as Negativity (Georgiou et al, 2011;Black et al, 2013;Chakravarthula et al, 2015a;Tseng et al, 2017), Depression (Gupta et al, 2014;Morales et al, 2018) and Empathy (Xiao et al, 2012;Gibson et al, 2016;Pérez-Rosas et al, 2017). However, there are some critical aspects of this behavior assessment process which humans can handle naturally and easily but machines still cannot, one of which is the notion of how much to observe in order to reliably assess behavior.…”
Section: Introductionmentioning
confidence: 99%