Forest fire monitoring plays a crucial role in preventing and mitigating forest disasters. Early detection of forest fire smoke is essential for a timely response to forest fire emergencies. The key to effective forest fire monitoring lies in accounting for the various levels of forest fire smoke targets in the monitoring images, enhancing the model’s anti-interference capabilities against mountain clouds and fog, and reducing false positives and missed detections. In this paper, we propose an improved multi-level forest fire smoke detection model based on You Only Look Once v5s (Yolov5s) called SIMCB-Yolo. This model aims to achieve high-precision detection of forest fire smoke at various levels. First, to address the issue of low precision in detecting small target smoke, a Swin transformer small target monitoring head is added to the neck of Yolov5s, enhancing the precision of small target smoke detection. Then, to address the issue of missed detections due to the decline in conventional target smoke detection accuracy after improving small target smoke detection accuracy, we introduced a cross stage partial network bottleneck with three convolutional layers (C3) and a channel block sequence (CBS) into the trunk. These additions help extract more surface features and enhance the detection accuracy of conventional target smoke. Finally, the SimAM attention mechanism is introduced to address the issue of complex background interference in forest fire smoke detection, further reducing false positives and missed detections. Experimental results demonstrate that, compared to the Yolov5s model, the SIMCB-Yolo model achieves an average recognition accuracy (mAP50) of 85.6%, an increase of 4.5%. Additionally, the mAP50-95 is 63.6%, an improvement of 6.9%, indicating good detection accuracy. The performance of the SIMCB-Yolo model on the self-built forest fire smoke dataset is also significantly better than that of current mainstream models, demonstrating high practical value.