Managing sloping terrains worldwide presents a significant challenge due to the lack of structured management practices, particularly when integrating various fruit trees in irregular arrangements. This study addresses the complexity arising from mixed cultivation by proposing a solution utilizing Unmanned Aerial Vehicle (UAV) imagery for tree species recognition and fruit tree classification. Our approach involves equipping UAVs with multispectral and optical cameras to capture imagery over experimental sloping terrain. The collected data undergoes processing for classifying different types of fruit trees, roads, and buildings through Orthophoto images. Convolutional Neural Networks (CNN) are employed for image recognition in challenging hillside terrains, with deep neural network methods, specifically VGG-16, VGG-19, and ResNet-50, being applied and compared. VGG-16 achieved significant accuracy in multispectral imagery analysis. Subsequently, various image fusion techniques, including Brovey, Hue-Saturation-Value, Principal Components Analysis (PCA), and Gram-Schmidt, were explored, with PCA demonstrating superior performance. The study revealed that image fusion, particularly with near-infrared or red-edge bands, significantly enhanced prediction accuracy compared to standalone multispectral imagery. The combination of visible-band fused imagery with additional spectral bands yielded the highest accuracy, improving overall prediction accuracy from 0.76 to 0.92. This research provides valuable insights applicable to diverse regions grappling with challenges in managing sloping terrains and mixed fruit tree cultivation.INDEX TERMS Convolutional neural network, fruit trees classification, image fusion, multispectral imagery, unmanned aerial vehicle.