The rapid development of transportation industry has brought some potential safety hazards. Aiming at the problem of driving safety, the application of artificial intelligence technology in safe driving behavior recognition can effectively reduce the accident rate and economic losses. Based on the presence of interference signals such as spatiotemporal background mixed signals in the driving monitoring video sequence, the recognition accuracy of small targets such as human eyes is low. In this paper, an improved dual-stream convolutional network is proposed to recognize the safe driving behavior. Based on convolutional neural networks (CNNs), attention mechanism (AM) is integrated into a long short-term memory (LSTM) neural network structure, and the hybrid dual-stream AM-LSTM convolutional network channel is designed. The spatial stream channel uses the CNN method to extract the spatial characteristic value of video image and uses pyramid pooling instead of traditional pooling, normalizing the scale transformation. The time stream channel uses a single-shot multibox detector (SSD) algorithm to calculate the adjacent two frames of video sequence for the detection of small objects such as face and eyes. Then, AM-LSTM is used to fuse and classify dual-stream information. The self-built driving behavior video image set is built. ROC, accuracy rate, and loss function experiments are carried out in the FDDB database, VOT100 data set, and self-built video image set, respectively. Compared with CNN, SSD, IDT, and dual-stream recognition methods, the accuracy rate of this method can be improved by at least 1.4%, and the average absolute error in four video sequences can be improved by more than 2%. On the contrary, in the self-built image set, the recognition rate of doze reaches 68.3%, which is higher than other methods. The experimental results show that this method has good recognition accuracy and practical application value.