The rapid expansion of the drone industry has resulted in a substantial increase in the number of low-altitude drones, giving rise to concerns regarding collision avoidance and countermeasure strategies among these unmanned aerial vehicles. These challenges underscore the urgent need for air-to-air drone target detection. An effective target detection model must exhibit high accuracy, real-time capabilities, and a lightweight network architecture to achieve a balance between precision and speed when deployed on embedded devices. In response to these requirements, we initially curated a dataset comprising over 10,000 images of low-altitude operating drones. This dataset encompasses diverse and intricate backgrounds, significantly enhancing the model’s training capacity. Subsequently, a series of enhancements were applied to the YOLOv5 algorithm to realize lightweight object detection. A novel feature extraction network, CF2-MC, streamlined the feature extraction process, while an innovative module, MG, in the feature fusion section aimed to improve detection accuracy and reduce model complexity. Concurrently, the original CIoU loss function was replaced with the EIoU loss function to further augment the model’s accuracy. Experimental results demonstrate an enhancement in the accuracy of drone target detection, achieving mAP values of 95.4% on the UAVfly dataset and 82.2% on the Det-Fly dataset. Finally, real-world testing conducted on the Jetson TX2 revealed that the YOLOv5s-ngn model achieved an average inference speed of 14.5 milliseconds per image. The code utilized in this paper can be accessed via https://github.com/lucien22588/yolov5-ngn.git.