The mechanization and intelligentization of the production process are the main trends in research and development of agricultural products. The realization of an unmanned and automated picking process is also one of the main research hotspots in China’s agricultural product engineering technology field in recent years. The development of automated apple-picking robot is directly related to imaging research, and its key technology is to use algorithms to realize apple identification and positioning. Aiming at the problem of false detection and missed detection of densely occluded targets and small targets by apple picking robots under different lighting conditions, two different apple recognition algorithms are selected based on the apple shape features to study the traditional machine learning algorithm: histogram of oriented gradients + support vector machine (HOG + SVM) and a fast recognition method for multiple apple targets in a complex occlusion environment based on improved You-Only-Look-Once-v5 (YOLOv5). The first is the improvement of the CSP structure in the network. Using parameter reconstruction, the convolutional layer (Conv) and the batch normalization (BN) layer in the CBL (Conv + BN + Leaky_relu activation function) module are fused into a batch-normalized convolutional layer Conv_B. Subsequently, the CA (coordinate attention) mechanism module is embedded into different network layers in the improved designed backbone network to enhance the expressive ability of the features in the backbone network to better extract the features of different apple targets. Finally, for some targets with overlapping occlusions, the loss function is fine-tuned to improve the model’s ability to recognize occluded targets. By comparing the recognition effects of HOG + SVM, Faster RCNN, YOLOv6, and baseline YOLOv5 on the test set under complex occlusion scenarios, the
F
1
value of this method was increased by 13.47%, 6.01%, 1.26%, and 3.63%, respectively, and the
F
1
value of this method was increased by 19.36%, 13.07%, 1.61%, and 4.27%, respectively, under different illumination angles. The average image recognition time was 0.27 s faster than that of HOG + SVM, 0.229 s faster than that of Faster RCNN, and 0.006 s faster than that of YOLOv6. The method is expected to provide a theoretical basis for apple-picking robots to choose a pertinent image recognition algorithm during operation.