In speech emotion recognition, the use of deep learning algorithms that extract and classify features of audio emotion samples usually requires the use of a large amount of resources, which makes the system more complex. This paper proposes a speech emotion recognition system based on dynamic convolutional neural network combined with bi-directional long and short-term memory network. On the one hand, the dynamic convolutional kernel allows the neural network to extract global dynamic emotion information, which can improve the performance while ensuring the computational power of the model, and on the other hand, the bi-directional long and short-term memory network enables the model to classify the emotion features more effectively with the temporal information. In this paper, we use CISIA Chinese speech emotion dataset, EMO-DB German emotion corpus and IEMOCAP English corpus to conduct experiments, and the average emotion recognition accuracy of the experimental results are 59.08%, 89.29% and 71.25%, which are 1.17%, 1.36% and 2.97% higher than the accuracy of speech emotion recognition systems using mainstream models, respectively. The effectiveness of the method in this paper is proved.