2023
DOI: 10.11591/eei.v12i5.5031
|View full text |Cite
|
Sign up to set email alerts
|

Data augmentation and enhancement for multimodal speech emotion recognition

Abstract: Humans’ fundamental need is interaction with each other such as using conversation or speech. Therefore, it is crucial to analyze speech using computer technology to determine emotions. The speech emotion recognition (SER) method detects emotions in speech by examining various aspects. SER is a supervised method to decide the emotion class in speech. This research proposed a multimodal SER model using one of the deep learning based enhancement techniques, which is the attention mechanism. Additionally, this re… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 18 publications
0
3
0
Order By: Relevance
“…GANs can supplement training data for emotion detection models. GANs can assist increase the variety of the dataset by generating Table 1 Multimodal Emotion Recognition Results more synthetic samples, which can lead to better generalization and increased per-formance of the emotion detection model [52]. Sima et.…”
Section: Machine Learning Techniques Used For Emotion Detectionmentioning
confidence: 99%
“…GANs can supplement training data for emotion detection models. GANs can assist increase the variety of the dataset by generating Table 1 Multimodal Emotion Recognition Results more synthetic samples, which can lead to better generalization and increased per-formance of the emotion detection model [52]. Sima et.…”
Section: Machine Learning Techniques Used For Emotion Detectionmentioning
confidence: 99%
“…They play a pivotal role in identifying manipulated images, transforming images between domains, and ensuring high-fidelity visual inputs for automation tasks. GANs enhance the quality of audio signals by reducing noise, improving clarity, and aiding in speech recognition tasks [16]. In cybersecurity, GANs are instrumental in both fake image detection [13] and intrusion detection [17].…”
Section: Generative Adversarial Network For Data Augmentationmentioning
confidence: 99%
“…Additionally, image augmentation is also carried out by applying random flip and random rotation to the images to increase diversity and size of the training set. Data or image augmentation can improve accuracy of model and reduce overfitting effect because the model can learn with a wider variety of images [25]- [27].…”
Section: Data Preprocessingmentioning
confidence: 99%