2024
DOI: 10.11591/eei.v13i3.6049
|View full text |Cite
|
Sign up to set email alerts
|

Enhancing speech emotion recognition with deep learning using multi-feature stacking and data augmentation

Khasyi Al Mukarram,
M. Anang Mukhlas,
Amalia Zahra

Abstract: This study evaluates the effectiveness of data augmentation on 1D convolutional neural network (CNN) and transformer models for speech emotion recognition (SER) on the Ryerson audio-visual database of emotional speech and song (RAVDESS) dataset. The results show that data augmentation has a positive impact on improving emotion classification accuracy. Techniques such as noising, pitching, stretching, shifting, and speeding are applied to increase data variation and overcome class imbalance. The 1D CNN model wi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 19 publications
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?