An essential industrial application is the examination of surface flaws in hot-rolled steel strips. While automatic visual inspection tools must meet strict real-time performance criteria for inspecting hot-rolled steel strips, their capabilities are constrained by the accuracy and processing speed of the algorithm used to identify defects. To solve the problems of poor detection accuracy, low detection efficiency, and unsuitability of low computing power platforms of the hot-rolled strip surface defect detection algorithm The Swin-Transformer-YOLOv5 model based on the improved one-stage detector is proposed. By employing GhostNet, the model’s lightweight design, and guaranteed detection accuracy are both achieved. The C3 module introduces Swin-Transformer to address the issues of cluttered backdrops of defect photos and easily confused defect categories. With the addition of the CoordAttention module, the model’s capacity to extract defective features is improved, and its performance keeps getting better. The issue of huge differences in different scales and poor detection of small flaws is resolved by employing BiFPN for feature fusion, and the detector’s capacity to adapt to targets of different scales is improved. The experimental results demonstrate that the improved Swin-Transformer-Yolov5 model significantly outperforms the industry-standard target detection algorithms, and the model’s mAP value still improves by 8.39% over the original model while reducing the number of parameters, GFLOPs, and weight by 36.6%, 40.0%, and 34.7%, respectively. The model is better suited for use on low-arithmetic platforms as a result.