The burning of straw is a very destructive process that threatens people’s livelihoods and property and causes irreparable environmental damage. It is therefore essential to detect and control the burning of straw. In this study, we analyzed Sentinel-2 data to select the best separation bands based on the response characteristics of clouds, smoke, water bodies, and background (vegetation and bare soil) to the different bands. The selected bands were added to the red, green, and blue bands (RGB) as training sample data. The band that featured the highest detection accuracy, RGB_Band6, was finally selected, having an accuracy of 82.90%. The existing object detection model cannot directly handle multi-band images. This study modified the input layer structure based on the YOLOv5s model to build an object detection network suitable for multi-band remote sensing images. The Squeeze-and-Excitation (SE) network attention mechanism was introduced based on the YOLOv5s model so that the delicate features of smoke were enhanced, and the Convolution + Batch normalization + Leaky ReLU (CBL) module was replaced with the Convolution + Batch normalization + Mish (CBM) module. The accuracy of the model was improved to 75.63%, which was 1.81% better than before. We also discussed the effect of spatial resolution on model detection and where accuracies of 84.18%, 73.13%, and 45.05% for images of 60-, 20-, and 10-m resolution, respectively, were realized. The experimental results demonstrated that the accuracy of the model only sometimes improved with increasing spatial resolution. This study provides a technical reference for the monitoring of straw burning, which is vital for both the control of straw burning and ways to improve ambient air quality.