In recent years, the application of speech emotion recognition (SER) in the supervision of Internet public opinion has received increasing attention. This study proposes a new SER algorithm to analyze the public opinion information of network platforms. Firstly, we extract different spectrum features from speech signals and combine them into frame level speech features. Then, we select conditional deep confidence network (CDBN) which has the ability to learn sequential features as the final classification model. We apply particle swarm optimization (PSO) and genetic algorithm (GA) during the fine-tuning stage of the CDBN to obtain more suitable optimal weights of the whole network, and propose the PSO-GA-CDBN (PGCDBN) model. Compare with the traditional back propagation (BP) algorithm, our training method accelerates the convergence speed of the network and improves the robustness and recognition performance of the network. In our experiment, we used the Chinese Academy of Sciences' Institute of automation (CASIA) Chinese emotional corpus and self-collected Chinese speech datasets, which were collected from Sina Weibo, Tik tok and other online social media platforms. Compare with the popular emotion classifiers such as support vector machine (SVM), deep residual network (ResNet), long short-term memory (LSTM) neural network, DBN, our proposed PGCDBN achieves the best recognition results from both datasets. In addition, we use bidirectional LSTM before PGCDBN to further process the extracted speech features, and the result of bidirectional LSTM has stronger speech signal expression ability. The average recognition accuracy of this new hybrid deep learning model algorithm in two datasets is 98.67%, which can be used for the supervision of netizens' opinions.