Emotion detection from multilingual audio using deep analysis

Bhattacharya, Sudipta; Borah, Samarjeet; Mishra, Brojo Kishore; Mondal, Atreyee

doi:10.1007/s11042-022-12411-3

Cited by 13 publications

(2 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…5) Contrast is a feature extraction in audio that is useful for estimating the average sound energy based on each sub-band's peak and valley spectral values [24].…”

Section: B Feature Extraction 1) Mel Frequency Cepstral Coefficients ...mentioning

confidence: 99%

Multi-Features Audio Extraction for Speech Emotion Recognition Based on Deep Learning

Gondohanindijo¹,

-²,

Noersasongko³

et al. 2023

IJACSA

View full text Add to dashboard Cite

The increasing need for human interaction with computers makes the interaction process more advanced, one of which is by utilizing voice recognition. Developing a voice command system also needs to consider the user's emotional state because the users indirectly treat computers like humans in general. By knowing the type of a person's emotions, the computer can adjust the type of feedback that will be given so that the human-computer interaction (HCI) process will run more humanely. Based on the results of previous research, increasing the accuracy of recognizing the types of human emotions is still a challenge for researchers. This is because not all types of emotions can be expressed equally, especially differences in language and cultural accents. In this study, it is proposed to recognize speech-based emotion types using multifeature extraction and deep learning. The dataset used is taken from the RAVDESS database. The dataset was then extracted using MFCC, Chroma, Mel-Spectrogram, Contrast, and Tonnetz. Furthermore, in this study, PCA (Principal Component Analysis) and Min-Max Normalization techniques will be applied to determine the impact resulting from the application of these techniques. The data obtained from the pre-processing stage is then used by the Deep Neural Network (DNN) model to identify the types of emotions such as calm, happy, sad, angry, neutral, fearful, surprised, and disgusted. The model testing process uses the confusion matrix technique to determine the performance of the proposed method. The test results for the DNN model obtained the accuracy value of 93.61%, a sensitivity of 73.80%, and a specificity of 96.34%. The use of multi-features in the proposed method can improve the performance of the model's accuracy in determining the type of emotion based on the RAVDESS dataset. In addition, using the PCA method also provides an increase in pattern correlation between features so that the classifier model can show performance improvements, especially accuracy, specificity, and sensitivity.

show abstract

“…5) Contrast is a feature extraction in audio that is useful for estimating the average sound energy based on each sub-band's peak and valley spectral values [24].…”

Section: B Feature Extraction 1) Mel Frequency Cepstral Coefficients ...mentioning

confidence: 99%

Multi-Features Audio Extraction for Speech Emotion Recognition Based on Deep Learning

Gondohanindijo¹,

-²,

Noersasongko³

et al. 2023

IJACSA

View full text Add to dashboard Cite

show abstract

“…The foundation of social dynamics and interpersonal communication is an understanding of human emotions. Human-computer interaction, affective computing, and psychology all heavily rely on the detection and interpretation of emotions [6], especially from appearances like facial expressions. With the increased popularity of video content in digital media, there is a growing interest in investigating techniques for identifying emotions in videos.…”

Section: Introductionmentioning

confidence: 99%

DeepEmoVision: Unveiling Emotion Dynamics in Video Through Deep Learning Algorithms

2024

IJACSA

View full text Add to dashboard Cite

Emotion detection from videos plays a pivotal role in understanding human behavior and interaction. This study delves into a cutting-edge method that leverages Recurrent Neural Networks (RNN), Support Vector Machines (SVM), K-Nearest Neighbours (KNN), Convolutional Neural Networks (CNN) and to precisely detect emotions exhibited in video content, holding significant importance in comprehending human behavior and interactions. The devised approach entails a multi-phase procedure: initially, employing CNN-based feature extraction to isolate facial expressions and pertinent visual cues by extracting and pre-processing video frames. These extracted features capture intricate patterns and spatial information crucial for discerning emotions. The results of the trials show that CNN, SVM, KNN, and RNN have promising performance, highlighting their potential. Among the other machine learning models, RNN has attained a 95% accuracy rate in recognizing and classifying emotions in video information. This combination of approaches provides a thorough plan for identifying emotions in dynamic visual material in real time.

show abstract