SUMMARY
One of the fastest and the natural mode of communication between humans is speech. Recently in speech signal processing, noncontact clinical issue, patient remote monitoring, and man–machine interaction application; emotion recognition from the speech is becoming a popular topic of research. The basic principle behind speech emotion recognition is to analyze the diverse emotions expressed by a speaker while uttering the same obsession under dissimilar emotional circumstances. A novel speech emotion recognition method is formulated in this research study in two major phases: spectral feature extraction and recognition. Initially, the extracted spectral skewness, delta AMS, spectral flux, spectral kurtosis, spectral spread, and spectral slope features from the input speech signal are subjected to the proposed ensemble classifier to determine the corresponding emotions from the input signal. The suggested ensemble classifier includes neural network 1, neural network 2, neural network 3, random forest, and recurrent neural network. The final recognized outcome is acquired from the RNN, which is enhanced via tuning its weight. A new improved cat‐swarm optimization algorithm is used to address this optimization problem. Finally, the obtained findings are compared to other state‐of‐the‐art algorithms.