2019
DOI: 10.1016/j.specom.2019.09.002
|View full text |Cite
|
Sign up to set email alerts
|

Automatic speech emotion recognition using an optimal combination of features based on EMD-TKEO

Abstract: In this paper, we propose a global approach for speech emotion recognition (SER) system using empirical mode decomposition (EMD). Its use is motivated by the fact that the EMD combined with the Teager-Kaiser Energy Operator (TKEO) gives an efficient time-frequency analysis of the non-stationary signals. In this method, each signal is decomposed using EMD into oscillating components called intrinsic mode functions (IMFs). TKEO is used for estimating the time-varying amplitude envelope and instantaneous frequenc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
53
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
5

Relationship

0
10

Authors

Journals

citations
Cited by 89 publications
(54 citation statements)
references
References 34 publications
1
53
0
Order By: Relevance
“…Kerkeni et al proposed an automated SER system based on combination of features obtained in EMD domain. They have used modulation spectral features and modulation frequency features based on the IMF signal and combined them with cepstral features [49]. Their methodology was initially evaluated on the Spanish emotional database using RNN classifier and reported an accuracy of 91.16%.…”
Section: Discussionmentioning
confidence: 99%
“…Kerkeni et al proposed an automated SER system based on combination of features obtained in EMD domain. They have used modulation spectral features and modulation frequency features based on the IMF signal and combined them with cepstral features [49]. Their methodology was initially evaluated on the Spanish emotional database using RNN classifier and reported an accuracy of 91.16%.…”
Section: Discussionmentioning
confidence: 99%
“…They used PCA for feature transformation with quadratic kernel SVM as a classification algorithm that achieved an average accuracy of 60.1% on the RAVDESS database. Kerkeni et al [50] proposed a model for recognizing speech emotion using empirical mode decomposition (EMD) based on optimal features that included the reconstructed signal based on Mel frequency cepstral coefficient (SMFCC), energy cepstral coefficient (ECC), modulation frequency feature (MFF), modulation spectral (MS) and frequency weighted energy cepstral coefficient (FECC). They achieved an average accuracy of 91.16% on the Spanish database using the RNN algorithm for classification.…”
Section: Related Studiesmentioning
confidence: 99%
“…The resulting feature dimension was reduced using the vector quantization method, and the obtained feature vector was used as input to a Radial Basis Function Neural Network (RBFNN) classifier. Recently, in [21], the authors proposed a combination of Empirical Mode Decomposition (EMD) with the Teager-Kaiser Energy Operator (TKEO). They proposed novel features named Modulation Spectral (MS) features and Modulation Frequency Features (MFF) based on the AM-FMmodulation model and combined them with cepstral features.…”
Section: Review Of Feature Level Fusion Based Sermentioning
confidence: 99%