Data augmentation and enhancement for multimodal speech emotion recognition

Setyono, Jonathan Christian; Zahra, Amalia

doi:10.11591/eei.v12i5.5031

Cited by 5 publications

(3 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…GANs can supplement training data for emotion detection models. GANs can assist increase the variety of the dataset by generating Table 1 Multimodal Emotion Recognition Results more synthetic samples, which can lead to better generalization and increased per-formance of the emotion detection model [52]. Sima et.…”

Section: Machine Learning Techniques Used For Emotion Detectionmentioning

confidence: 99%

A Survey on Multi-modal Emotion Detection Techniques

Chatterjee,

Shah,

Bhatt

et al. 2024

Preprint

View full text Add to dashboard Cite

The utilization of emotion detection and recognition technologies has revolution- ized human-computer interactions in various fields such as sentiment analysis, health monitoring, education, and automotive interfaces. Previously, traditional systems relied on single-channel affect sensing, which limited their ability to cap- ture the complexity of human emotions. However, humans naturally combine multiple cues such as facial expressions, speech, gestures, and contextual factors when expressing their emotions. As a result, there has been a growing inter- est in multi-modal emotion frameworks that integrate different sensory streams to obtain more comprehensive emotion assessments. These holistic perspectives allow for the capture of nuanced affective information that would otherwise be difficult to represent. In this survey paper, we delve into the latest advancements in emotion recognition systems, examining fusion techniques, feature engineer- ing methods, and classification architectures that leverage inputs from various modalities such as vision, audio, and text. Our focus is to showcase innova- tive interventions throughout the entire pipeline, from preprocessing raw signals to predicting emotion labels, in order to enable robust multi-modal analysis. Through detailed theoretical discussions and practical case studies, this paper aims to inspire further research by providing insights into the current state-of- the-art, highlighting open challenges, and exploring promising avenues in emotion detection through cross-modal learning.

show abstract

Section: Machine Learning Techniques Used For Emotion Detectionmentioning

confidence: 99%

A Survey on Multi-modal Emotion Detection Techniques

Chatterjee,

Shah,

Bhatt

et al. 2024

Preprint

View full text Add to dashboard Cite

show abstract

“…They play a pivotal role in identifying manipulated images, transforming images between domains, and ensuring high-fidelity visual inputs for automation tasks. GANs enhance the quality of audio signals by reducing noise, improving clarity, and aiding in speech recognition tasks [16]. In cybersecurity, GANs are instrumental in both fake image detection [13] and intrusion detection [17].…”

Section: Generative Adversarial Network For Data Augmentationmentioning

confidence: 99%

Implementing generative adversarial networks for increasing performance of transmission fault classification

Goswami,

Roy,

Kalavala

et al. 2024

IJEECS

View full text Add to dashboard Cite

<p>An electrical power system is a network that facilitates the sourcing, transfer, and distribution of electrical energy. In the traditional power system, there are eleven types of faults that can occur in the system. This paper focuses on the classification of these faults over a stretch of 100 kilometres. The dataset used is synthetic and generated from a simulated model using MATLAB/Simulink software. Data augmentation is carried out during training to improve the accuracy of the classification. An indirect training approach through generative adversarial network (GAN) is used to classify these overhead transmission line faults. The random forest (RF) classification is used as the base learning model on the original dataset and it achieves accuracy of 84%. However, the base learner RF when used on GAN model generated augmented faulty data, it performs exceptionally well achieving 99% accuracy. One of the recent state-of-art methods is compared with this approach.</p>

show abstract

“…Additionally, image augmentation is also carried out by applying random flip and random rotation to the images to increase diversity and size of the training set. Data or image augmentation can improve accuracy of model and reduce overfitting effect because the model can learn with a wider variety of images [25]- [27].…”

Section: Data Preprocessingmentioning

confidence: 99%

Skin cancer classification using EfficientNet architecture

Harahap,

Husein,

Kwok

et al. 2024

Bulletin EEI

View full text Add to dashboard Cite

Skin cancer is one of the most common deadly diseases worldwide. Hence, skin cancer classification is becoming increasingly important because treatment in the early stages of skin cancer is much more effective and efficient. This study focuses on the classification of three common types of skin cancer, namely basal cell carcinoma (BCC), squamous cell carcinoma (SCC), and melanoma using EfficientNet architecture. The dataset is preprocessed and each image in the dataset is resized to 256×256 pixels prior to incorporation in later stages. We then train all types of EfficientNet starting from EfficientNet-B0 to EfficientNet-B7 and compare their performances. Based on the test results, all trained EfficientNet models are capable of producing good accuracy, precision, recall, and F1-score in skin cancer classification. Particularly, our designed EfficientNet-B4 model achieves 79.69% accuracy, 81.67% precision, 76.56% recall, and 79.03% F1-score as the highest among others. These results confirm that EfficientNet architecture can be utilized to classify skin cancer properly.

show abstract

Data augmentation and enhancement for multimodal speech emotion recognition

Cited by 5 publications

References 18 publications

A Survey on Multi-modal Emotion Detection Techniques

A Survey on Multi-modal Emotion Detection Techniques

Implementing generative adversarial networks for increasing performance of transmission fault classification

Skin cancer classification using EfficientNet architecture

Contact Info

Product

Resources

About