Due to the small size, high density, and background noise associated with strip surface defects, the current object detection model commonly faces limitations in performance. To address this issue, we propose a spatial-to-depth feature-enhanced detection method called STD-Detector. The method consists of two types STD-Conv-A and STD-Conv-B. First, the STD-Conv-A module is used in the backbone feature extraction network to expand the field of perception and enable the model to learn a wider range of background information. Then, the STD-Conv-B module is used for feature fusion networks to improve the expression of output features. In addition, we incorporate the convolutional block attention module to mitigate background interference and enhance the performance of the model. Finally, experimental results on the NEU-DET dataset show that our method achieves a mean average precision of 82.9%, which represents a 3.9% improvement over the baseline. Compared with the state-of-the-art model, our method exhibits greater competitiveness in detecting strip surface defects. Moreover, experimental results on the road surface defect dataset show that our method has good robustness.