Proceedings of the 29th ACM International Conference on Multimedia 2021
DOI: 10.1145/3474085.3479236
|View full text |Cite
|
Sign up to set email alerts
|

Depa

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
10
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 28 publications
(10 citation statements)
references
References 17 publications
0
10
0
Order By: Relevance
“…DNN trained on such data would lead to under-fitting; consequently, the classification result needs to be more convincing. One workable solution to the above problem is to pre-train a model on extensive data followed by leveraging the model’s knowledge to downstream tasks [e.g., speaker recognition ( Snyder et al, 2018 ), PD detection ( Moro-Velazquez et al, 2020 ), depression detection ( Zhang et al, 2021 )]. Primarily, results in Zhang et al (2021) showed that the larger out-domain (e.g., speech recognition) dataset for audio embedding pre-training generally improves performance better than the relatively little in-domain (depression detection) dataset.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…DNN trained on such data would lead to under-fitting; consequently, the classification result needs to be more convincing. One workable solution to the above problem is to pre-train a model on extensive data followed by leveraging the model’s knowledge to downstream tasks [e.g., speaker recognition ( Snyder et al, 2018 ), PD detection ( Moro-Velazquez et al, 2020 ), depression detection ( Zhang et al, 2021 )]. Primarily, results in Zhang et al (2021) showed that the larger out-domain (e.g., speech recognition) dataset for audio embedding pre-training generally improves performance better than the relatively little in-domain (depression detection) dataset.…”
Section: Related Workmentioning
confidence: 99%
“…One workable solution to the above problem is to pre-train a model on extensive data followed by leveraging the model’s knowledge to downstream tasks [e.g., speaker recognition ( Snyder et al, 2018 ), PD detection ( Moro-Velazquez et al, 2020 ), depression detection ( Zhang et al, 2021 )]. Primarily, results in Zhang et al (2021) showed that the larger out-domain (e.g., speech recognition) dataset for audio embedding pre-training generally improves performance better than the relatively little in-domain (depression detection) dataset. Therefore, we pre-trained speaker embedding extractors on CN-Celeb ( Fan et al, 2020 ), a large-scale Chinese speaker recognition dataset, followed by extracting corresponding embeddings on our Chinese depression speech dataset.…”
Section: Related Workmentioning
confidence: 99%
“…One is to extract hand‐crafted features from speech signals and then input them into deep neural network [37], where deep framework is only used as classifier. The other is to apply an end‐to‐end deep architecture, which feeds the original audio signal or spectrum to deep network to learn high‐level features automatically [38]. As it could solve the problems encountered in hand‐crafted features, such as high threshold, labour cost and low feature utilization rate, deep learning slowly becomes the leader in the field of machine learning.…”
Section: Research Evolutionmentioning
confidence: 99%
“…It avoids problems such as gradient disappearance to a certain extent, and can relatively learn information of long time series, so it is suitable for time series data such as speech. Since deep learning methods have been popularised in the field of SDR, a number of RNN‐based studies have been carried out [37, 38, 77, 88, 89]. Alhanai et al.…”
Section: Research Evolutionmentioning
confidence: 99%
“…[16] proposes a multi‐task TCN learning model to estimate the degree of depression, combined with related tasks such as emotion and emotion recognition to identify depression. [17] proposes a self‐supervised acoustic feature extraction pre‐training network model for depression which uses a convolutional encoding‐decoding structure and combines it with the LSTM network for depression recognition. The recognition accuracy has been improved to a certain extent.…”
Section: Introductionmentioning
confidence: 99%