Interspeech 2017 2017
DOI: 10.21437/interspeech.2017-94
|View full text |Cite
|
Sign up to set email alerts
|

Discretized Continuous Speech Emotion Recognition with Multi-Task Deep Recurrent Neural Network

Abstract: Estimating continuous emotional states from speech as a function of time has traditionally been framed as a regression problem. In this paper, we present a novel approach that moves the problem into the classification domain by discretizing the training labels at different resolutions. We employ a multi-task deep bidirectional long-short term memory (BLSTM) recurrent neural network (RNN) trained with cost-sensitive Cross Entropy loss to model these labels jointly. We introduce an emotion decoding algorithm tha… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
30
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
5
3
2

Relationship

1
9

Authors

Journals

citations
Cited by 48 publications
(31 citation statements)
references
References 22 publications
1
30
0
Order By: Relevance
“…We only show results that are reported on the audio modality in the results tables. We also compare our performance to that of an optimized BLSTM regression model, described in [24]. Our final dilated convolution structure has a depth of 10 layers, each having a width of 32.…”
Section: Methodsmentioning
confidence: 99%
“…We only show results that are reported on the audio modality in the results tables. We also compare our performance to that of an optimized BLSTM regression model, described in [24]. Our final dilated convolution structure has a depth of 10 layers, each having a width of 32.…”
Section: Methodsmentioning
confidence: 99%
“…A wealth of research pertains to multi-label learning for affect recognition, using a single database labeled with a few affective dimensions, predominantly arousal/ valence/ dominance [12,13,14]. Xia and Liu [15] proposed a method to learn a main task (emotion classification) and a secondary task (arousal/ valence recognition).…”
Section: Multi-dimensional Affect Recognitionmentioning
confidence: 99%
“…The most common approach is using a recurrent neural network (RNN) and its variants. Long short term memory (LSTM) has been used in several works to improve the model's ability to catch the long-term dependency in a time series [16,17]. Moreover, For catching the interesting time steps in a sequence, the attention mechanism is also added to the LSTMs and showing its effectiveness [18].…”
Section: Model Architecturementioning
confidence: 99%