2023
DOI: 10.1016/j.specom.2022.11.005
|View full text |Cite
|
Sign up to set email alerts
|

Modulation spectral features for speech emotion recognition using deep neural networks

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
8

Relationship

0
8

Authors

Journals

citations
Cited by 32 publications
(3 citation statements)
references
References 93 publications
0
3
0
Order By: Relevance
“…Human emotions are the key elements in human communication, and speech is fundamental to emotional expression. Speech emotion recognition (SER) aims to distinguish and categorise emotions by extracting and modelling the speech data, such as the short‐term energy, zero‐crossing rate, pitch, formant, prosody, spectral features, temporal features, and Mel frequency cepstral coefficient (MFCC) features [1, 2]. The distributions of the temporal structure, loudness, fundamental frequency, and formant structures are associated with the differences in the emotions communicated in speech.…”
Section: Introductionmentioning
confidence: 99%
“…Human emotions are the key elements in human communication, and speech is fundamental to emotional expression. Speech emotion recognition (SER) aims to distinguish and categorise emotions by extracting and modelling the speech data, such as the short‐term energy, zero‐crossing rate, pitch, formant, prosody, spectral features, temporal features, and Mel frequency cepstral coefficient (MFCC) features [1, 2]. The distributions of the temporal structure, loudness, fundamental frequency, and formant structures are associated with the differences in the emotions communicated in speech.…”
Section: Introductionmentioning
confidence: 99%
“…Singh et al [6] proposed an approach for speech emotion recognition using a deep neural network with support vector machine (DNN-SVM) model. The proposed model employed the constant-Q transform based modulation spectral features (CQT-MSF) to process the speech data.…”
Section: A Traditional Machine Learningmentioning
confidence: 99%
“…Many voice features have been discovered by the academicians and other researchers. Some of these include formant frequency [22]. Formant frequencies are a sort of resonance frequencies.…”
Section: Introductionmentioning
confidence: 99%