2018
DOI: 10.1016/j.jbi.2018.05.007
|View full text |Cite
|
Sign up to set email alerts
|

Automated depression analysis using convolutional neural networks from speech

Abstract: To help clinicians to efficiently diagnose the severity of a person's depression, the affective computing community and the artificial intelligence field have shown a growing interest in designing automated systems. The speech features have useful information for the diagnosis of depression. However, manually designing and domain knowledge are still important for the selection of the feature, which makes the process labor consuming and subjective. In recent years, deep-learned features based on neural networks… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

1
78
1

Year Published

2019
2019
2024
2024

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 172 publications
(80 citation statements)
references
References 27 publications
1
78
1
Order By: Relevance
“…They divided the speech into many fixed length segments, then 2268 features were extracted from these segments by open-source Emotion and Affect Recognition (openEAR) toolkit [19]. However, only using these hand-crafted low-level features might lost other information associated with depression [12]. In addition, they aggregated the features extracted from segments to generate the representation of the speech through average-pooling, which is a special case of p-norm pooling [20] and not necessarily optimal for the depression detection.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…They divided the speech into many fixed length segments, then 2268 features were extracted from these segments by open-source Emotion and Affect Recognition (openEAR) toolkit [19]. However, only using these hand-crafted low-level features might lost other information associated with depression [12]. In addition, they aggregated the features extracted from segments to generate the representation of the speech through average-pooling, which is a special case of p-norm pooling [20] and not necessarily optimal for the depression detection.…”
Section: Related Workmentioning
confidence: 99%
“…However, the number of Gaussian components was not adapted to the depression detection task, which affected the accuracy of prediction [22]. He et al [12] proposed a four-stream CNN to detect an individual depression level. Although, CNN is good at capturing spatial structure [23], it can not well explore the impact of temporal changes on depression detection.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Most of the studies to date measure speech in controlled settings (e.g. recording participants reading passages aloud in quiet rooms) and focus on detecting specific features of speech (He & Cao, 2018;Jiang et al, 2018;Li et al, 2018). An alternative approach would be to use wearable devices to objectively detect how much speech participants encounter and produce in their natural environment.…”
Section: Introductionmentioning
confidence: 99%
“…e-mail: (see http://www.idiap.ch/en/people/directory). have used statistics of features, called low level descriptors (LLD), that are related to both the vocal-source and vocal-tract to improve the systems [12,4,14]; however not all the statistical properties contribute to the improvements. Despite these advances, there seem to be no concurred set of features for detecting depression from speech signals; and moreover, the performances of all these systems may be limited by the choice of features and their statistics.…”
Section: Introductionmentioning
confidence: 99%