The location and number of individual fruit trees (IFTs) are critical for investigations on planting areas, fruit yield predictions, and smart orchard planning and management. These data are conventionally obtained through manual and statistical investigations that require long, laborious, and costly efforts. Object detection models of deep learning could provide an opportunity to detect IFTs accurately, which is essential for rapidly obtaining these data and reducing human operation errors. This study proposed an approach for detecting IFTs and mapping their spatial distributions by integrating deep learning with unmanned aerial vehicle (UAV) remote sensing. UAV remote sensing was used to collect high-resolution images of fruit trees in pomelo orchards in Meizhou, South China. Based on these images, a new individual pomelo tree image sample (IPTIS) dataset was created through manual interpretation and field investigation. The evaluation results revealed that YOLOv5s was the best model among the five YOLOv5 models (i.e., YOLOv5n, YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5x, whose layers, parameters, and floating-point operations all increased with the depth and width of layers.) of different scales considered for optimization. Moreover, the coordinate attention (CA)-optimized YOLOv5 model (YOLOv5s-CA) is the best model (named Manuscript