This work proposes a new approach based on YOLOX model enhancement for the helmet-wearing real-time detection task, which is plagued by low detection accuracy, incorrect detection, and missing detection. First, in the backbone network, recursive gated convolution (gnConv) is utilized instead of traditional convolution, hence addressing the issue of extracting many worthless features due to excessive redundancy in the process of feature extraction using conventional convolution. Replace the original FPN layer in the Neck network with the EfficientNet-BiFPN layer. Realize top–down and bottom–up bidirectional fusion of deep and shallow features to improve the flow of feature data between network layers. Lastly, the SIOU cross-entropy loss function is implemented to address the issue of missed detections in crowded environments and further increase the model’s detection precision. Experiments and data comparisons indicate that the modified model’s average detection accuracy is 95.5%, which is 5.4% higher than that of the original network model, and that the detection speed has been dramatically increased to suit actual production requirements.