2019 5th International Conference on Computing, Communication, Control and Automation (ICCUBEA) 2019
DOI: 10.1109/iccubea47591.2019.9129067
|View full text |Cite
|
Sign up to set email alerts
|

Speech Emotion Recognition using MFCC features and LSTM network

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 33 publications
(8 citation statements)
references
References 5 publications
0
8
0
Order By: Relevance
“…Most prior research uses CNN-based models for SER [37]. Among such models, the notable ones include AlexNet [38], VGG [39,40], and ResNet50 [41,42]. This section provides a short overview of the models.…”
Section: Architectures and Settingsmentioning
confidence: 99%
“…Most prior research uses CNN-based models for SER [37]. Among such models, the notable ones include AlexNet [38], VGG [39,40], and ResNet50 [41,42]. This section provides a short overview of the models.…”
Section: Architectures and Settingsmentioning
confidence: 99%
“…MFCC is widely used to analyze any speech signal and had performed well for speech-based emotion recognition systems compared to other features. In [92], MFCC feature extraction is used, and 39 coefficients are extracted. Long Short-Term Memory (LSTM) is implemented for emotion recognition.…”
Section: Review Of Speech Emotion Recognitionmentioning
confidence: 99%
“…Unlike most of the previous studies in the literature, the method of our feature fusion was inspired by the way that conventional speech features (e.g., Mel-Frequency Cepstral Coefficients (MFCCs)) are computed. That is, 32D Low-Level Descriptor (LLD) features, including 12D Chroma [22] and 20D MFCC [23], are extracted. The High-Level Statistical Functions (HSF), such as the mean of Chroma and the mean, variance, and maximum of MFCC, are calculated accordingly.…”
Section: Emotion Feature Extractionmentioning
confidence: 99%
“…Each frame was Z-normalized. To each frame, 32D Low-Level Descriptor (LLD) features, including 12D Chroma [23] and 20D MFCC [24], were extracted. The High-Level Statistical Functions (HSF), such as the mean of Chroma and the mean, variance, and maximum of MFCC, were calculated.…”
Section: Feature Extractionmentioning
confidence: 99%