Currently, aeroplane images captured by camera sensors are characterized by their small size and intricate backgrounds, posing a challenge for existing deep learning algorithms in effectively detecting small targets. This paper incorporates the RFBNet (a coordinate attention mechanism) and the SIOU loss function into the YOLOv5 algorithm to address this issue. The result is developing the model for aeroplane and undercarriage detection. The primary goal is to synergize camera sensors with deep learning algorithms, improving image capture precision. YOLOv5-RSC enhances three aspects: firstly, it introduces the receptive field block based on the backbone network, increasing the size of the receptive field of the feature map, enhancing the connection between shallow and deep feature maps, and further improving the model’s utilization of feature information. Secondly, the coordinate attention mechanism is added to the feature fusion network to assist the model in more accurately locating the targets of interest, considering attention in the channel and spatial dimensions. This enhances the model’s attention to key information and improves detection precision. Finally, the SIoU bounding box loss function is adopted to address the issue of IoU’s insensitivity to scale and increase the speed of model bounding box convergence. Subsequently, the Basler camera experimental platform was constructed for experimental verification. The results demonstrate that the AP values of the YOLOv5-RSC detection model for aeroplane and undercarriage are 92.4% and 80.5%, respectively. The mAP value is 86.4%, which is 2.0%, 5.4%, and 3.7% higher than the original YOLOv5 algorithm, respectively, with a detection speed reaching 89.2 FPS. These findings indicate that the model exhibits high detection precision and speed, providing a valuable reference for aeroplane undercarriage detection.