In response to the current challenges fire detection algorithms encounter, including low detection accuracy and limited recognition rates for small fire targets in complex environments, we present a lightweight fire detection algorithm based on an improved YOLOv5s. The introduction of the CoT (Contextual Transformer) structure into the backbone neural network, along with the creation of the novel CSP1_CoT (Cross stage partial 1_contextual transformer) module, has effectively reduced the model’s parameter count while simultaneously enhancing the feature extraction and fusion capabilities of the backbone network; The network’s Neck architecture has been extended by introducing a dedicated detection layer tailored for small targets and incorporating the SE (Squeeze-and-Excitation) attention mechanism. This augmentation, while minimizing parameter proliferation, has significantly bolstered the interaction of multi-feature information, resulting in an enhanced small target detection capability; The substitution of the original loss function with the Focal-EIoU (Focal-Efficient IoU) loss function has yielded a further improvement in the model’s convergence speed and precision; The experimental results indicate that the modified model achieves an mAP@.5 of 96% and an accuracy of 94.8%, marking improvements of 8.8% and 8.9%, respectively, over the original model. Furthermore, the model’s parameter count has been reduced by 1.1%, resulting in a compact model size of only 14.6MB. Additionally, the detection speed has reached 85 FPS (Frames Per Second), thus satisfying real-time detection requirements. This enhancement in precision and accuracy, while simultaneously meeting real-time and lightweight constraints, effectively caters to the demands of fire detection.