Safety helmets are essential in various indoor and outdoor workplaces, such as metallurgical high-temperature operations and high-rise building construction, to avoid injuries and ensure safety in production. However, manual supervision is costly and prone to lack of enforcement and interference from other human factors. Moreover, small target object detection frequently lacks precision. Improving safety helmets based on the helmet detection algorithm can address these issues and is a promising approach. In this study, we proposed a modified version of the YOLOv5s network, a lightweight deep learning-based object identification network model. The proposed model extends the YOLOv5s network model and enhances its performance by recalculating the prediction frames, utilizing the IoU metric for clustering, and modifying the anchor frames with the K-means++ method. The global attention mechanism (GAM) and the convolutional block attention module (CBAM) were added to the YOLOv5s network to improve its backbone and neck networks. By minimizing information feature loss and enhancing the representation of global interactions, these attention processes enhance deep learning neural networks’ capacity for feature extraction. Furthermore, the CBAM is integrated into the CSP module to improve target feature extraction while minimizing computation for model operation. In order to significantly increase the efficiency and precision of the prediction box regression, the proposed model additionally makes use of the most recent SIoU (SCYLLA-IoU LOSS) as the bounding box loss function. Based on the improved YOLOv5s model, knowledge distillation technology is leveraged to realize the light weight of the network model, thereby reducing the computational workload of the model and improving the detection speed to meet the needs of real-time monitoring. The experimental results demonstrate that the proposed model outperforms the original YOLOv5s network model in terms of accuracy (Precision), recall rate (Recall), and mean average precision (mAP). The proposed model may more effectively identify helmet use in low-light situations and at a variety of distances.