2023
DOI: 10.1109/taffc.2022.3221749
|View full text |Cite
|
Sign up to set email alerts
|

Multitask Learning From Augmented Auxiliary Data for Improving Speech Emotion Recognition

Abstract: Despite recent advancements in speech emotion recognition (SER) models, state-of-the-art deep learning (DL) approaches face the challenge of the limited availability of annotated data. Large language models (LLMs) have revolutionised our understanding of natural language, introducing emergent properties that broaden comprehension in language, speech, and vision. This paper examines the potential of LLMs to annotate abundant speech data, aiming to enhance the state-of-the-art in SER. We evaluate this capability… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 14 publications
(6 citation statements)
references
References 98 publications
0
6
0
Order By: Relevance
“…We chose it for the following reasons. Firstly, this database is often used in research on emotional speech [22,23] and is widely available. Secondly, as presented by the authors of [24], this database enables the highest efficiency of classification (88.47% [25]) compared to other databases, such as RAVDESS (87.5% [15,26]) or IEMOCAP (75.60% [27,28]).…”
Section: Audio Datamentioning
confidence: 99%
“…We chose it for the following reasons. Firstly, this database is often used in research on emotional speech [22,23] and is widely available. Secondly, as presented by the authors of [24], this database enables the highest efficiency of classification (88.47% [25]) compared to other databases, such as RAVDESS (87.5% [15,26]) or IEMOCAP (75.60% [27,28]).…”
Section: Audio Datamentioning
confidence: 99%
“…Te hierarchical multitask learning framework is also proposed that takes the coarse classifcation and fne classifcation as two tasks [28]. Te augmentation of data and unsupervised reconstruction can be taken as an auxiliary task to avoid the difculties caused by the data annotation [29]. Another method is more complicated that obtains the multiscale unifed metric [30] by the multitask learning, where the classifcation of both Emission States Category and Emission Intensity Scale is the main task and the classifcation of phone recognition and gender recognition is the auxiliary task.…”
Section: Multitask Learningmentioning
confidence: 99%
“…Te hierarchical multitask learning is proposed that uses the coarse classifcation and fne classifcation as two tasks [28]. Tere methods use unsupervised reconstruction as an auxiliary task [29]. Te more complicated method obtains the multiscale unifed metric [30], where the phone recognition and gender recognition are the auxiliary tasks.…”
Section: Introductionmentioning
confidence: 99%
“…To increase the size of the training data when the construction of a large-scale emotional speech corpus is difficult, several approaches [7][8][9][10][11][12][13] have used multiple emotional speech corpora in training. Another class of approaches tries to enhance the generalization capability by introducing a variety of regularization approaches and metric losses [7,11,[14][15][16][17][18][19][20][21][22][23].…”
Section: Introductionmentioning
confidence: 99%