In this paper, we construct a model of convolutional neural network speech emotion algorithm, analyze the classroom identified by the neural network with a certain degree of confidence together with the school used in the dataset, find the characteristics and rules of teachers’ control of classroom emotion nowadays using big data, find the parts of classroom emotion, and design a classroom emotion recognition model based on convolutional neural network speech emotion algorithm according to these characteristics. This paper will investigate the factors and patterns of teachers’ emotional control in the classroom. In this paper, the existing neural network is adapted and improved, and some preprocessing is performed on the current dataset to train the network. The network used in this paper is a combination of convolutional neural network (CNN) and recurrent neural network (RNN), which takes advantage of both CNN for feature extraction and RNN for memory capability in the sequence model. This network has a good effect on both object labeling and speech recognition. For the problem of extracting emotion features of whole-sentence speech, we propose an attention mechanism-based emotion recognition algorithm for variable-length speech and design a spatiotemporal attention module for the speech emotion algorithm and a convolutional channel attention module for the CNN network to reduce the contribution of the spatiotemporal data of the speech emotion algorithm and the unimportant parts of the CNN convolutional channel feature data in the subsequent recognition by the attention mechanism. In turn, the weight of core key data and features is increased to improve the model recognition accuracy.