Unmanned aerial vehicle (UAV) aerial images often present challenges such as small target sizes, high target density, varied shooting angles, and dynamic poses. Existing target detection algorithms exhibit a noticeable performance decline when confronted with UAV aerial images compared to general scenes. This paper proposes an outstanding small target detection algorithm for UAVs, named Fine-Grained Feature Perception YOLOv8s-P2 (FGFP-YOLOv8s-P2), based on YOLOv8s-P2 architecture. We specialize in improving inspection accuracy while meeting real-time inspection requirements. First, we enhance the targets’ pixel information by utilizing slice-assisted training and inference techniques, thereby reducing missed detections. Then, we propose a feature extraction module with deformable convolutions. Decoupling the learning process of offset and modulation scalar enables better adaptation to variations in the size and shape of diverse targets. In addition, we introduce a large kernel spatial pyramid pooling module. By cascading convolutions, we leverage the advantages of large kernels to flexibly adjust the model’s attention to various regions of high-level feature maps, better adapting to complex visual scenes and circumventing the cost drawbacks associated with large kernels. To match the excellent real-time detection performance of the baseline model, we propose an improved Random FasterNet Block. This block introduces randomness during convolution and captures spatial features of non-linear transformation channels, enriching feature representations and enhancing model efficiency. Extensive experiments and comprehensive evaluations on the VisDrone2019 and DOTA-v1.0 datasets demonstrate the effectiveness of FGFP-YOLOv8s-P2. This achievement provides robust technical support for efficient small target detection by UAVs in complex scenarios.