Speech recognition (SR) technology, as one of the core technologies of human-computer interaction, aims to enable computers to understand the process of converting speech signals into corresponding text or commands through natural language. With the exponential increase of internet information, the features of massive speech data have significant non-specific differences and noise interference. Common feature extraction and transformation methods are no longer sufficient to meet the current needs of model training and recognition. With the rapid growth of machine learning (ML), many researchers use neural networks (NN) to solve various problems in the SR field. This article designs a deep learning (DL) algorithm based on convolutional neural networks (CNN) and recurrent neural networks (RNN) for SR. Firstly, sample filtering, pre weighting, signal framing, and endpoint detection are performed on the speech signal. Secondly, the MFCC value of the preprocessed data is extracted. Finally, an NN model is trained and constructed, and the trained qualified model is used to complete the recognition of speech features. The experimental results show that the algorithm designed in this paper has a lower error rate for SR and stronger generalization ability, which is of great significance for the study of SR.