Detecting emotions from human speech: role of gender information

Gupta, Manish; Patel, Tirth; Mankad, Sapan H.; Vyas, Tarjni

doi:10.1109/tensymp54529.2022.9864557

Cited by 2 publications

(12 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This technique involves generating new data from existing databases through slight modifications. Standard data augmentation techniques within the SER field include adding noise, stretching, altering pitch, and shifting [10], [12], [13].…”

Section: Related Workmentioning

confidence: 99%

“…There are various methods for extracting audio characteristics from databases. The literature has employed techniques such as Mel-Frequency Cepstral Coefficients (MFCC) [13], [15], Zero Crossing Rate (ZCR) [3], Chromagram [9], Mel Spectrogram [10], and Root Mean Square (RMS) values [12]. The choice of feature extraction technique directly influences the classifier's performance and the resulting outcomes.…”

Section: Related Workmentioning

confidence: 99%

“…In [10] compares the impact of data augmentation and feature extraction techniques on speech emotion recognition. The authors apply data augmentation techniques, including noise, stretching, and pitch shifting.…”

Section: Related Workmentioning

confidence: 99%

“…It divides the frequency spectrum into evenly spaced Mel scale frequencies, producing a Mel spectrogram for each window. Then, it decomposes the signal's magnitude into components corresponding to the Mel frequencies [22] [10]. The study extracted the values from the datasets using the Librosa library.…”

Section: B Data Augmentationmentioning

confidence: 99%

See 3 more Smart Citations

Optimizing Speech Emotion Recognition:Evaluating Combinations of Databases, Data Augmentation, and Feature Extraction Methods

Ottoni,

Cerqueira

2023

Anais Do XVI Congresso Brasileiro De Inteligência Computacional

View full text Add to dashboard Cite

Speech emotion recognition is a challenging and essential task with numerous applications in human-computer interaction, healthcare, and entertainment. However, achieving high accuracy in this task is complicated by the need to select the best combination of machine learning algorithms, databases, data augmentation techniques, and feature extraction methods. This paper discusses the difficulty of choosing appropriate combinations of these factors and proposes a methodology to address this challenge. The proposed method evaluates the performance of various combinations of databases, data augmentation techniques, and feature extraction methods to determine the most effective approach for speech emotion recognition. The paper also presents a convolutional neural network to classify the emotions of happiness, sadness, fear, anger, surprise, disgust, and neutral. The results showed that the optimal combination proposed, with 94% accuracy, uses the combined RAVDESS and TESS databases, using data augmentation with noise, stretch, and pitch, and using MFCC to extract the characteristics of the audios.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: B Data Augmentationmentioning

confidence: 99%

See 2 more Smart Citations

Optimizing Speech Emotion Recognition:Evaluating Combinations of Databases, Data Augmentation, and Feature Extraction Methods

Ottoni,

Cerqueira

2023

Anais Do XVI Congresso Brasileiro De Inteligência Computacional

View full text Add to dashboard Cite

show abstract

“…In the early days of SER research, the primary focus was on probabilistic models such as Hidden Markov Models (HMMs) [6,7] and Gaussian Mixture Models (GMMs) [8,9]. Recently, with the advent of deep learning, the landscape of emotion recognition has significantly shifted towards neural-network-based approaches [4,10]. Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Deep Neural Networks (DNNs) now play a predominant role in advancing speech emotion recognition [11].…”

Section: Introductionmentioning

confidence: 99%

A Deep Learning Approach for Speech Emotion Recognition Optimization Using Meta-Learning

Ottoni,

Cerqueira

2023

Electronics

View full text Add to dashboard Cite

Speech emotion recognition (SER) is widely applicable today, benefiting areas such as entertainment, robotics, and healthcare. This emotional understanding enhances user-machine interaction, making systems more responsive and providing more natural experiences. In robotics, SER is useful in home assistance devices, eldercare, and special education, facilitating effective communication. Additionally, in healthcare settings, it can monitor patients’ emotional well-being. However, achieving high levels of accuracy is challenging and complicated by the need to select the best combination of machine learning algorithms, hyperparameters, datasets, data augmentation, and feature extraction methods. Therefore, this study aims to develop a deep learning approach for optimal SER configurations. It delves into the domains of optimizer settings, learning rates, data augmentation techniques, feature extraction methods, and neural architectures for the RAVDESS, TESS, SAVEE, and R+T+S (RAVDESS+TESS+SAVEE) datasets. After finding the best SER configurations, meta-learning is carried out, transferring the best configurations to two additional datasets, CREMA-D and R+T+S+C (RAVDESS+TESS+SAVEE+CREMA-D). The developed approach proved effective in finding the best configurations, achieving an accuracy of 97.01% for RAVDESS, 100% for TESS, 90.62% for SAVEE, and 97.37% for R+T+S. Furthermore, using meta-learning, the CREMA-D and R+T+S+C datasets achieved accuracies of 83.28% and 90.94%, respectively.

show abstract

Detecting emotions from human speech: role of gender information

Cited by 2 publications

References 22 publications

Optimizing Speech Emotion Recognition:Evaluating Combinations of Databases, Data Augmentation, and Feature Extraction Methods

Optimizing Speech Emotion Recognition:Evaluating Combinations of Databases, Data Augmentation, and Feature Extraction Methods

A Deep Learning Approach for Speech Emotion Recognition Optimization Using Meta-Learning

Contact Info

Product

Resources

About