2020
DOI: 10.1109/access.2020.2991811
|View full text |Cite
|
Sign up to set email alerts
|

Emotion Recognition From Speech Using Wavelet Packet Transform Cochlear Filter Bank and Random Forest Classifier

Abstract: This research aims to design and implement an artificial emotional intelligence system that is capable of identifying the unknown emotion of the speaker. To that end, we propose a novel framework for emotion recognition in the presence of noise and interference. Our approach accounts for energy, time and spectral parameters to examine the emotion of the speaker. However, rather than using Gammatone filterbank and short-time Fourier transform (STFT), commonly adopted in the literature, we propose employing a no… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
18
0
1

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 38 publications
(19 citation statements)
references
References 29 publications
0
18
0
1
Order By: Relevance
“…Te calculation process for the reset gate is shown in equation ( 6), while the calculation of GRU output is shown in equation (8). Te key distinction between vanilla RNNs and GRUs is that the latter supports gating of the hidden state.…”
Section: Gated Recurrentmentioning
confidence: 99%
See 1 more Smart Citation
“…Te calculation process for the reset gate is shown in equation ( 6), while the calculation of GRU output is shown in equation (8). Te key distinction between vanilla RNNs and GRUs is that the latter supports gating of the hidden state.…”
Section: Gated Recurrentmentioning
confidence: 99%
“…But situations for recording sounds might be stressful, or the collected dataset may contain noise. Hamsa et al [8] proposed a method for Arabic emotion recognition in stressful and noisy situations. Tey proposed a model based on novel wavelet packet transform as denoising techniques and a random forest model for emotion recognition.…”
Section: Introductionmentioning
confidence: 99%
“…After normalizing the CNN features, temporal information is learned by moving the components to the deep bi-directional long-term memory (BiLSTM). Hamsa et al [21] proposed a noise and interference-resilient SER system. The speaker's emotion is examined by considering the energy, spectral, and time parameters.…”
Section: Literature Surveymentioning
confidence: 99%
“…The prediction value is constructed from individual decision trees by using different aggregation policies [15] like majority voting, computing mean, or median, depending on the problem type. Due to its simplicity and efficiency, the Random Forest model has been successfully applied to several practical problems like patient health prediction [2], image classification [3], emotion recognition [4], malware detection [5], [16] and user response prediction [7], [17].…”
Section: Related Workmentioning
confidence: 99%