Interspeech 2019 2019
DOI: 10.21437/interspeech.2019-3068
|View full text |Cite
|
Sign up to set email alerts
|

Learning Temporal Clusters Using Capsule Routing for Speech Emotion Recognition

Abstract: Emotion recognition from speech plays a significant role in adding emotional intelligence to machines and making humanmachine interaction more natural. One of the key challenges from machine learning standpoint is to extract patterns which bear maximum correlation with the emotion information encoded in this signal while being as insensitive as possible to other types of information carried by speech. In this paper, we propose a novel temporal modelling framework for robust emotion classification using bidirec… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
19
1

Year Published

2019
2019
2024
2024

Publication Types

Select...
3
3
1

Relationship

1
6

Authors

Journals

citations
Cited by 52 publications
(25 citation statements)
references
References 27 publications
1
19
1
Order By: Relevance
“…Yuni et al [43] presented a spectrogram-based CNN model for multi-class audio classification on the combination of two models to achieve 64.48%, accuracy in multitask SER. Jalal et al [44] and Anjali et al [45] used the log spectrogram and spectral feature to recognize the emotion in speech data with 68% and 75%, accuracy, respectively. Table 10 shows the computational simplicity of the proposed DSCNN model with others baseline CNN model using IEMOCAP dataset for SER.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Yuni et al [43] presented a spectrogram-based CNN model for multi-class audio classification on the combination of two models to achieve 64.48%, accuracy in multitask SER. Jalal et al [44] and Anjali et al [45] used the log spectrogram and spectral feature to recognize the emotion in speech data with 68% and 75%, accuracy, respectively. Table 10 shows the computational simplicity of the proposed DSCNN model with others baseline CNN model using IEMOCAP dataset for SER.…”
Section: Discussionmentioning
confidence: 99%
“…Zeng et al [43] Spectrograms --64.48% Jalal et al [44] log-spectrogram -69.4% 68.10% Bhavan et al [45] spectral features --75.69% Proposed model Raw_Spectrograms 68% 61% 70.00% Proposed model Clean_Spectrograms 80% 79% 79.5% Table 10. Computational comparison of the suggested DSCNN model with other baseline CNNs models.…”
Section: Input Weighted Accuracy Unweighted Accuracy Accuracymentioning
confidence: 99%
“…Philosophically, it has similarity with the context expansion technique in feature-space minimum phone error (fMPE) [16,17]. Sequential and hybridhierarchical models were proposed to learn deep feature representations [12,14], and task-specific feature clusters [13].…”
Section: Related Workmentioning
confidence: 99%
“…DNNs learn task-specific abstract feature representations by filtering out unnecessary information and improving generalisation [8,9,10]. Research has suggested representation learning by modelling mid to long-term sequence dependencies [11,12,13].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation