A Multi-Task Learning Framework for Emotion Recognition Using 2D Continuous Space

Xia, Rui; Liu, Yang

doi:10.1109/taffc.2015.2512598

Cited by 161 publications

(101 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Following this work, Han et al [18] combined the emotion prediction with an annotation uncertainty as joint tasks to be learnt together. Xia and Liu [19] suggested incorporating the losses from both the categorical and the dimensional emotion recognition to optimise the neural networks. Zhang et al [20] investigated MTL in a cross-corpus scenario, where many auxiliary tasks, such as corpus, domain, and gender distinctions, were considered to be optimised along with emotion recognition.…”

Section: Related Workmentioning

confidence: 99%

Attention-augmented End-to-end Multi-task Learning for Emotion Prediction from Speech

Zhang

Schuller

2019

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Despite the increasing research interest in end-to-end learning systems for speech emotion recognition, conventional systems either suffer from the overfitting due in part to the limited training data, or do not explicitly consider the different contributions of automatically learnt representations for a specific task. In this contribution, we propose a novel end-to-end framework which is enhanced by learning other auxiliary tasks and an attention mechanism. That is, we jointly train an end-to-end network with several different but related emotion prediction tasks, i. e., arousal, valence, and dominance predictions, to extract more robust representations shared among various tasks than traditional systems with the hope that it is able to relieve the overfitting problem. Meanwhile, an attention layer is implemented on top of the layers for each task, with the aim to capture the contribution distribution of different segment parts for each individual task. To evaluate the effectiveness of the proposed system, we conducted a set of experiments on the widely used database IEMOCAP. The empirical results show that the proposed systems significantly outperform corresponding baseline systems.

show abstract

Section: Related Workmentioning

confidence: 99%

Attention-augmented End-to-end Multi-task Learning for Emotion Prediction from Speech

Zhang

Schuller

2019

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

show abstract

“…The following databases are used in our experiments: , where we merge excitement and happi-ness class into the latter one [5], [6], [9], [10].…”

Section: Experiments and Resultsmentioning

confidence: 99%

“…We repeat the evaluation by reversing the roles of the two speakers. In the final assessment, we report the average performance obtained in terms of WA and UA obtained from all speakers [5], [6], [10]. In order to be easily comparable with the literature we follow three different normalization schemes.…”

Section: Leave One Session Out (Loso)mentioning

confidence: 99%

Integrating Recurrence Dynamics for Speech Emotion Recognition

Tzinis¹,

Paraskevopoulos²,

Baziotis³

et al. 2018

Interspeech 2018

View full text Add to dashboard Cite

We investigate the performance of features that can capture nonlinear recurrence dynamics embedded in the speech signal for the task of Speech Emotion Recognition (SER). Reconstruction of the phase space of each speech frame and the computation of its respective Recurrence Plot (RP) reveals complex structures which can be measured by performing Recurrence Quantification Analysis (RQA). These measures are aggregated by using statistical functionals over segment and utterance periods. We report SER results for the proposed feature set on three databases using different classification methods. When fusing the proposed features with traditional feature sets, e.g., [1], we show an improvement in unweighted accuracy of up to 5.7% and 10.7% on Speaker-Dependent (SD) and Speaker-Independent (SI) SER tasks, respectively, over the baseline [1]. Following a segment-based approach we demonstrate state-ofthe-art performance on IEMOCAP using a Bidirectional Recurrent Neural Network.

show abstract

“…There are 10 discrete emotion labels. For this study, we utilize the same category as in [6,18,19]: angry, happy, sad and neutral. To represent the majority of the emotion categories in the database, happy and excited are merged into happy.…”

Section: Methodsmentioning

confidence: 99%

Speech Emotion Recognition via Contrastive Loss under Siamese Networks

Lian

Tao

et al. 2018

Proceedings of the Joint Workshop of the 4th Workshop on Affective Social Multimedia Computing and First Multi-Modal Affective

View full text Add to dashboard Cite

Speech emotion recognition is an important aspect of humancomputer interaction. Prior work proposes various end-to-end models to improve the classification performance. However, most of them rely on the cross-entropy loss together with softmax as the supervision component, which does not explicitly encourage discriminative learning of features. In this paper, we introduce the contrastive loss function to encourage intra-class compactness and inter-class separability between learnable features. Furthermore, multiple feature selection methods and pairwise sample selection methods are evaluated. To verify the performance of the proposed system, we conduct experiments on The Interactive Emotional Dyadic Motion Capture (IEMOCAP) database -a common evaluation corpus. Experimental results reveal the advantages of the proposed method, which reaches 62.19% in the weighted accuracy and 63.21% in the unweighted accuracy. It outperforms the baseline system that is optimized without the contrastive loss function with 1.14% and 2.55% in the weighted accuracy and the unweighted accuracy, respectively.

show abstract

A Multi-Task Learning Framework for Emotion Recognition Using 2D Continuous Space

Cited by 161 publications

References 30 publications

Attention-augmented End-to-end Multi-task Learning for Emotion Prediction from Speech

Attention-augmented End-to-end Multi-task Learning for Emotion Prediction from Speech

Integrating Recurrence Dynamics for Speech Emotion Recognition

Speech Emotion Recognition via Contrastive Loss under Siamese Networks

Contact Info

Product

Resources

About