With the upgrading of aviation space technology, the amount of information contained in remote sensing images in the aviation is gradually increasing, and the detection technology based on small targets has developed. For lightweight small targets, pixels per unit area contain more information than large targets, and their area is too small, which is easily overlooked by conventional detection models. To enhance the attention of such algorithms, this study first introduces a Control Bus Attention Mechanism (CBAM) in the fifth generation You Only Look Once (YOLOv5) algorithm to increase the algorithm’s attention to small targets and generate optimization algorithms. Then convolutional neural network is used to mark feature pixels of the optimization algorithm, eliminate redundant information, and generate fusion algorithm, which is used to generate redundant information with high similarity when the optimization algorithm surveys pixel blocks. The novelty of this study lies in using CBAM to improve YOLOv5 algorithm. CBAM module can extract important features from images by adaptively learning the channel and spatial attention of feature maps. By weighting the channel and spatial attention of the feature map, the network can pay more attention to important features and suppress irrelevant background information. This attention mechanism can help the network better capture the characteristics of small targets and improve the accuracy and robustness of detection. Embedding CBAM module into YOLOv5 detection network can enhance the network's perception of small targets. CBAM module can improve the expressive ability and feature extraction ability of the network without increasing the complexity of the network. By introducing CBAM module, YOLOv5 can better capture the characteristics of small targets in aerial remote sensing images, and improve the detection accuracy and recall rate. Finally, the proposed fusion algorithm is used for experiments on the Tiny-Person dataset and compared with the fifth, sixth, and seventh generations of You Only Look Once. When the fusion algorithm tests the target, the classification accuracy of Sea-person is 39 %, the classification accuracy of Earth-person is 31 %, and the probability of being predicted as the background is 56 % and 67 %, respectively. And the overall accuracy of this algorithm is 0.987, which is the best among the four algorithms. The experimental results show that the fusion algorithm proposed in the study has precise positioning for lightweight small targets and can achieve good application results in aerial remote sensing images.