Speech processing is one of the required fields in digital signal processing that helps in processing the speech signals. The speech process is utilized in different fields such as emotion recognition, virtual assistants, voice identification, etc. Among the various applications, emotion recognition is one of the critical areas because it is used to recognize people’s exact emotions and eliminate physiological issues. Several researchers utilize signal processing and machine learning techniques together to find the exact human emotions. However, they fail to attain their feelings with less computational complexity and high accuracy. This paper introduces the intelligent computational technique called cat swarm optimized spiking neural network (CSSPNN). Initially, the emotional speech signal is collected from the Toronto emotional speech set (TESS) dataset, which is then processed by applying a wavelet approach to extract the features. The derived features are further examined using the defined classifier CSSPNN, which recognizes human emotions due to the effective training and learning process. Finally, the proficiency of the system is determined using experimental results and discussions. The proposed system recognizes the speech emotions up to 99.3% accuracy compared to recurrent neural networks (RNNs), deep neural networks (DNNs) and deep shallow neural networks (DSNNs).