PurposeNowadays, the speech emotion recognition (SER) model has enhanced as the main research topic in various fields including human–computer interaction as well as speech processing. Generally, it focuses on utilizing the models of machine learning for predicting the exact emotional status from speech. The advanced SER applications go successful in affective computing and human–computer interaction, which is making as the main component of computer system's next generation. This is because the natural human machine interface could grant the automatic service provisions, which need a better appreciation of user's emotional states.Design/methodology/approachThis paper implements a new SER model that incorporates both gender and emotion recognition. Certain features are extracted and subjected for classification of emotions. For this, this paper uses deep belief network DBN model.FindingsThrough the performance analysis, it is observed that the developed method attains high accuracy rate (for best case) when compared to other methods, and it is 1.02% superior to whale optimization algorithm (WOA), 0.32% better from firefly (FF), 23.45% superior to particle swarm optimization (PSO) and 23.41% superior to genetic algorithm (GA). In case of worst scenario, the mean update of particle swarm and whale optimization (MUPW) in terms of accuracy is 15.63, 15.98, 16.06% and 16.03% superior to WOA, FF, PSO and GA, respectively. Under the mean case, the performance of MUPW is high, and it is 16.67, 10.38, 22.30 and 22.47% better from existing methods like WOA, FF, PSO, as well as GA, respectively.Originality/valueThis paper presents a new model for SER that aids both gender and emotion recognition. For the classification purpose, DBN is used and the weight of DBN is used and this is the first work uses MUPW algorithm for finding the optimal weight of DBN model.