2023
DOI: 10.1109/taslp.2023.3250840
|View full text |Cite
|
Sign up to set email alerts
|

Learning Speech Emotion Representations in the Quaternion Domain

Abstract: The modeling of human emotion expression in speech signals is an important, yet challenging task. The high resource demand of speech emotion recognition models, combined with the general scarcity of emotion-labelled data are obstacles to the development and application of effective solutions in this field. In this paper, we present an approach to jointly circumvent these difficulties. Our method, named RH-emo, is a novel semisupervised architecture aimed at extracting quaternion embeddings from real-valued mon… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
8
0

Year Published

2023
2023
2025
2025

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 16 publications
(8 citation statements)
references
References 73 publications
0
8
0
Order By: Relevance
“…Firstly, this database is often used in research on emotional speech [22,23] and is widely available. Secondly, as presented by the authors of [24], this database enables the highest efficiency of classification (88.47% [25]) compared to other databases, such as RAVDESS (87.5% [15,26]) or IEMOCAP (75.60% [27,28]). However, unlike the TESS [29] database, which is classified with an accuracy of 99.6% [30] contains longer utterances, not just words with the same prefix.…”
Section: Audio Datamentioning
confidence: 93%
“…Firstly, this database is often used in research on emotional speech [22,23] and is widely available. Secondly, as presented by the authors of [24], this database enables the highest efficiency of classification (88.47% [25]) compared to other databases, such as RAVDESS (87.5% [15,26]) or IEMOCAP (75.60% [27,28]). However, unlike the TESS [29] database, which is classified with an accuracy of 99.6% [30] contains longer utterances, not just words with the same prefix.…”
Section: Audio Datamentioning
confidence: 93%
“…For the classifier, we opted to use a Convolutional Neural Network (CNN) as a deep learning model, as it has shown good performance in audio classification and speech recognition tasks [23]. Some papers in the literature that use CNN for the Speech Emotion Recognition (SER) problem can be seen in [3], [4], [8], [24], [25].…”
Section: Deep Learningmentioning
confidence: 99%
“…The choice of algorithm is crucial for successful recognition, as different algorithms have varying degrees of precision and efficiency. The choice of database is also essential, as the data quality used for training and testing can directly affect the system's ability to recognize emotions [4]. Additionally, the feature extraction technique is a critical factor, as selecting relevant features for the recognition task is essential for obtaining good results.…”
Section: Introductionmentioning
confidence: 99%
“…In recent studies on quaternion, researchers have achieved remarkable results by employing quaternion convolutional network (QCNN) for processing speech signals [24,25]. Quaternion, as extensions of hypercomplex numbers, enables the exploration and preservation of underlying connections within data information through quaternion convolution.…”
Section: Introductionmentioning
confidence: 99%