Object detection in drone aerial imagery has been a consistent focal point of research. Aerial images present more intricate backgrounds, greater variation in object scale, and a higher occurrence of small objects compared to standard images. Consequently, conventional object detection algorithms are often unsuitable for direct application in drone scenarios. To address these challenges, this study proposes a drone object detection algorithm model based on YOLOv5, named SMT-YOLOv5 (Small Target-YOLOv5). The enhancement strategy involves improving the feature fusion network by incorporating detection layers and implementing a weighted bidirectional feature pyramid network. Additionally, the introduction of the Combine Attention and Receptive Fields Block (CARFB) receptive field feature extraction module and DyHead dynamic target detection head aims to broaden the receptive field, mitigate information loss, and enhance perceptual capabilities in spatial, scale, and task domains. Experimental validation on the VisDrone2021 dataset confirms a significant improvement in the target detection accuracy of SMT-YOLOv5. Each improvement strategy yields effective results, raising the average precision by 12.4 percentage points compared to the original method. Detection improvements for large, medium, and small targets increase by 6.9%, 9.5%, and 7.7%, respectively, compared to the original method. Similarly, applying the same improvement strategies to the low-complexity YOLOv8n results in SMT-YOLOv8n, which is comparable in complexity to SMT-YOLOv5s. The results indicate that, relative to SMT-YOLOv8n, SMT-YOLOv5s achieves a 2.5 percentage point increase in average precision. Furthermore, comparative experiments with other enhancement methods demonstrate the effectiveness of the improvement strategies.