2021
DOI: 10.1109/taffc.2018.2870398
|View full text |Cite
|
Sign up to set email alerts
|

Integrating Deep and Shallow Models for Multi-Modal Depression Analysis—Hybrid Architectures

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
38
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 78 publications
(38 citation statements)
references
References 67 publications
0
38
0
Order By: Relevance
“…In this work, we make use of our previously proposed speech acoustic feature for depression severity prediction from speech [34], [45]. The feature set is composed of the Geneva Minimalistic Acoustic Parameter Set (GeMAPS) and the INTERSPEECH Challenges feature sets.…”
Section: Feature Generation a Speech Acoustic Featuresmentioning
confidence: 99%
See 2 more Smart Citations
“…In this work, we make use of our previously proposed speech acoustic feature for depression severity prediction from speech [34], [45]. The feature set is composed of the Geneva Minimalistic Acoustic Parameter Set (GeMAPS) and the INTERSPEECH Challenges feature sets.…”
Section: Feature Generation a Speech Acoustic Featuresmentioning
confidence: 99%
“…In total 238 low level descriptors (LLDs), consisting of 211 spectral and energy related features and 27 voicing related dynamic features are firstly extracted, then 25 statistical functionals and 4 regression functionals are performed resulting in a 6902 dimensional feature vector for each speech segment. The reader can refer to our previous work [34], [45] for more details. In this work, we consider the proposed features as speech descriptors to be learned by GAN.…”
Section: Feature Generation a Speech Acoustic Featuresmentioning
confidence: 99%
See 1 more Smart Citation
“…Most current vision-based approaches to automatic depression analysis [17], [18], [19], [20], [21] base their prediction on the non-verbal facial behaviours of participants during an interview. There remain several challenges to achieve actionable results in this scenario, and our proposed approach mainly focus on addressing three of them.…”
Section: Introductionmentioning
confidence: 99%
“…Tested on AVEC2014 [19], the method could predict with over 80% accuracy depressive behavior. Yang et al [25] proposed a system for hybridizing deep and shallow models for depression prediction from audio, video, and text descriptors. Researchers employed a DCNN–DNN model for audio-visual multimodal depression recognition using the PHQ-8 framework, and a paragraph vector (PV) analyzed the interview transcript using an SVM-based model in order to infer the physical and mental conditions of the subject.…”
Section: Introductionmentioning
confidence: 99%