Nowadays, defect detection technology based on deep learning continuously increases the surface quality requirements of hot-rolled strip steel. However, due to limitations in industrial production, defect datasets often suffer from insufficient training samples and imbalanced categories. This paper proposes effective solutions, namely the GT-CutMix offline data augmentation algorithm and lightweight small sample defect detection models. The GT-CutMix augmentation algorithm significantly improves defect utilization by accurately sampling defect locations and integrating them into the original data set. We design the S-deconvolutional single shot detector (DSSD) defect detection model by constructing a lightweight SI-MobileNet to replace the ResNet101 backbone of the DSSD network. This can reduce the resource parameters and consumption. At the same time, it can speed up training and inference. To further improve the detection accuracy, we integrate the pyramid split attention (PSA) mechanism into the prediction module of DSSD and construct the SA-DSSD model. Under the GT-CutMix augmentation algorithm, the mAP of S-DSSD and SA-DSSD models on X-SDD dataset are 76.83% and 78.63%, respectively. Meanwhile, the corresponding detection speeds are 45 FPS and 40 FPS, respectively. In addition, on the NEU-DET cross-dataset experiment, the mAP of the SA-DSSD model reaches 74.88%. Our methods are highly effective and generalized for small sample defect detection, which can provide selective solutions for specific needs such as high speed and precision in different industrial production scenarios.