ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
DOI: 10.1109/icassp.2019.8683498
|View full text |Cite
|
Sign up to set email alerts
|

Learning Voice Source Related Information for Depression Detection

Abstract: During depression neurophysiological changes can occur, which may affect laryngeal control i.e. behaviour of the vocal folds. Characterising these changes in a precise manner from speech signals is a non trivial task, as this typically involves reliable separation of the voice source information from them. In this paper, by exploiting the abilities of CNNs to learn task-relevant information from the input raw signals, we investigate several methods to model voice source related information for depression detec… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
19
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
2
1

Relationship

2
5

Authors

Journals

citations
Cited by 31 publications
(19 citation statements)
references
References 31 publications
(35 reference statements)
0
19
0
Order By: Relevance
“…"FC" layer in this architecture contains 100 nodes. The input to the CNNs is a 250ms signal, overlapped by a 10ms shift; these parameters are inspired from earlier works such as [13,17]. The targets to the CNNs are one-hot encodings of the dialects.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…"FC" layer in this architecture contains 100 nodes. The input to the CNNs is a 250ms signal, overlapped by a 10ms shift; these parameters are inspired from earlier works such as [13,17]. The targets to the CNNs are one-hot encodings of the dialects.…”
Section: Methodsmentioning
confidence: 99%
“…During testing, such posterior probability vectors are averaged across each utterance and a decision is made at the utterance-level based on the highest probability. Our recent work [17] showed that filtering raw speech based on prior knowledge facilitates better task-specific modelling. Along similar lines, apart from directly modelling raw speech and using speed perturbation (SP) as suggested in [11] to partially address data scarcity, we propose to use the following signal processing techniques to extract signals rich in vocal-tract related information: (a) homomorphic filtering and (b) linear prediction based filtering.…”
Section: Source-filter Decomposition Based Didmentioning
confidence: 99%
See 1 more Smart Citation
“…We used the raw-speech based CNN framework originally developed for speech recognition [15], and later extended to other tasks such as speaker verification [11], gender identification [8], presentation attack detection [12] or depression detection [5]. In this framework, as illustrated in Figure 1, the network consists of N convolution layers (Conv), maximum pooling (MaxP) and ReLU activations followed by a multilayer perceptron (MLP).…”
Section: Proposed Systemsmentioning
confidence: 99%
“…In the recent years, with the advances in deep learning, approaches have emerged where task-related information are directly learned from raw speech signals using convolutional neural networks (CNNs) in an end-to-end manner, i.e. without any short-term spectral processing [5,11,12,14,18,22,24]. This paper investigates the use of such an approach for the degree of sleepiness estimation.…”
mentioning
confidence: 99%