2019
DOI: 10.48550/arxiv.1909.01417
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Multi-level Attention network using text, audio and video for Depression Prediction

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 0 publications
0
3
0
Order By: Relevance
“…Previous studies in depression prediction using speech [15,16] have shown the superiority of MFCCs over other audio based features like extended Geneva Minimalistic Acoustic Parameter Set (eGeMAPS) [7] and DEEP SPECTRUM features [1]. Huang et al [9] showed with their depression classification study that coordination features computed from MFCCs perform better with respect to formants and eGeMAPS features.…”
Section: Mel-frequency Cepstral Coefficients (Mfccs)mentioning
confidence: 99%
“…Previous studies in depression prediction using speech [15,16] have shown the superiority of MFCCs over other audio based features like extended Geneva Minimalistic Acoustic Parameter Set (eGeMAPS) [7] and DEEP SPECTRUM features [1]. Huang et al [9] showed with their depression classification study that coordination features computed from MFCCs perform better with respect to formants and eGeMAPS features.…”
Section: Mel-frequency Cepstral Coefficients (Mfccs)mentioning
confidence: 99%
“…14 A study used the Extended Distress Analysis Interview Corpus Database to predict depression. 11 The combination of text, audio, and video examinations resulted in the best predictive accuracy.…”
Section: Introductionmentioning
confidence: 98%
“…4,10 These changes eventually alter vocal expression, facial gestures, and speech content. 11 Among these signals, voice and facial expressions are relatively difficult to hide because emotions are often expressed subconsciously. 12 Additionally, they can be easily recorded using various digital imaging and voice recording devices.…”
Section: Introductionmentioning
confidence: 99%