Interspeech 2019 2019
DOI: 10.21437/interspeech.2019-2406
|View full text |Cite
|
Sign up to set email alerts
|

Autonomous Emotion Learning in Speech: A View of Zero-Shot Speech Emotion Recognition

Abstract: Conventionally, speech emotion recognition is achieved using passive learning approaches. Differing from such approaches, we herein propose and develop a dynamic method of autonomous emotion learning based on zero-shot learning. The proposed methodology employs emotional dimensions as the attributes in the zero-shot learning paradigm, resulting in two phases of learning, namely attribute learning and label learning. Attribute learning connects the paralinguistic features and attributes utilising speech with kn… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
23
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
4
3
1

Relationship

2
6

Authors

Journals

citations
Cited by 13 publications
(23 citation statements)
references
References 28 publications
0
23
0
Order By: Relevance
“…Zero-Shot Speech Emotion Recognition: A basic framework for zero-shot learning in SER, containing two phases of attribute learning and label learning, is presented in [24]. The attribute-learning phase constructs the relationship between paralinguistic features and emotional descriptors or attributes, through the procedure of regression on seen-emotional samples using Support Vector Regression (SVR) or Deep Neural Networks (DNNs).…”
Section: A Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Zero-Shot Speech Emotion Recognition: A basic framework for zero-shot learning in SER, containing two phases of attribute learning and label learning, is presented in [24]. The attribute-learning phase constructs the relationship between paralinguistic features and emotional descriptors or attributes, through the procedure of regression on seen-emotional samples using Support Vector Regression (SVR) or Deep Neural Networks (DNNs).…”
Section: A Related Workmentioning
confidence: 99%
“…In order to deal with recognising unseen-emotional speech samples, we propose an autonomous learning strategy based on zero-shot emotion recognition [24]. Zero-Shot Learning (ZSL) has demonstrated high utility in image processing [25]- [28] and affective computing [29]- [31].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…It has been shown that the CNN module combined with long short-term memory (LSTM) neural networks works better than standalone CNN or LSTM based models for SER tasks in cross-corpus setting [37]. Other deep learning-based models like zero-shot learning, which learns using only a few labels [51] and Generative Adversarial Networks (GANs) to generate synthetic samples for robust learning have also been studied [10].…”
Section: Related Workmentioning
confidence: 99%
“…Computational paralinguistics make it possible to extract latent knowledge in audio signals (i.e., spoken signals) from human beings or animals [1][2][3]. Typical paralinguistics-related topics include emotion and personality recognition [4][5][6], autism diagnosis [7], native-speaker identification [8], or eating classification [9]. As an emerging topic in paralinguistics, Mask-Speech Identification (MSI) attempts to automatically distinguish whether a spoken utterance is pronounced by its speaker with or without a surgical mask [4].…”
Section: Introductionmentioning
confidence: 99%