Aiming at the problem of insufficient accuracy caused by the insufficient mining of spatiotemporal features in the process of unsafe behavior and danger identification of construction personnel, the traditional two-stream convolution model is improved, and a two-stream convolution dangerous behavior recognition model based on Faster R-CNN-LSTM is proposed. In this model, the Faster R-CNN network is connected in parallel with the LSTM network. The Faster R-CNN network is used as the spatial flow, and the human spatial motion posture is divided into static and dynamic features to extract the anchor point features, respectively. The fusion of the two is used as the output of the spatial flow. An improved sliding long-term and short-term memory network is used in the time flow to increase the extraction ability of the time series features of the construction personnel. Finally, the two branches are fused in time and space to classify and identify whether the construction personnel wear safety helmets. The results show that the MAP of the improved Faster R-CNN-LSTM network framework is increased by 15%. The original CNN-LSTM network framework detected four targets, but there was one misdetection, with an accuracy of 91.48%. The improved frame detection accuracy reaches 99.99%, and there is no error detection. The proposed method is superior to the pre-improvement and other methods that can effectively identify the unsafe behavior of construction workers on construction sites and also has a good distinction effect on fuzzy actions.