Unmanned aerial vehicle (UAV) equipped with high-definition cameras can quickly acquire image data and are widely applied, however, the task suffers from issues such as smaller objects, large scale transformations, dense objects, and occlusion. To address the above issues, LMS-YOLOv7 is proposed by combining multi-scale feature fusion and lightweight decoupled Head. Firstly, we propose the Skipping Fusion Efficient Layer Aggregation Network (SF-ELAN), in which a cross-layer fusion strategy is used to segment, extract and fuse the features in the channel to enhance the feature representation capability. Secondly, based on the multi-scale feature fusion approach, the Multi-scale Cross Stage Partial (MS-CSP) is proposed to fuse deep and shallow information to avoid losing features. In addition, based on our redesigned LightGSConv, the lightweight decoupled detection Head (LD-Head) is proposed, which decouples the localization information of the feature map from the classification information while keeping the detection head lightweight. Extensive experiments on VisDrone and WiderPer-son demonstrate that the proposed method exhibits higher perception of small objects, and the mAP@.5 metric of our method on the VisDrone improves by 3.6% over the baseline method, and our method achieves the highest accuracy on the WiderPerson. The code for the proposed method is publicly available at GitHub.