Picking robots have become an important development direction of smart agriculture, and the position detection of fruit is the key to realizing robot picking. However, the existing detection models have the shortcomings of missing detection and slow detection speed when detecting dense and occluded grape targets. Meanwhile, the parameters of the existing model are too large, which makes it difficult to deploy to the mobile terminal. In this paper, a lightweight GA-YOLO model is proposed. Firstly, a new backbone network SE-CSPGhostnet is designed, which greatly reduces the parameters of the model. Secondly, an adaptively spatial feature fusion mechanism is used to address the issues of difficult detection of dense and occluded grapes. Finally, a new loss function is constructed to improve detection efficiency. In 2022, a detection experiment was carried out on the image data collected in the Bagui rural area of Guangxi Zhuang Autonomous Region, the results demonstrate that the GA-YOLO model has an mAP of 96.87%, detection speed of 55.867 FPS and parameters of 11.003 M. In comparison to the model before improvement, the GA-YOLO model has improved mAP by 3.69% and detection speed by 20.245 FPS. Additionally, the GA-YOLO model has reduced parameters by 82.79%. GA-YOLO model not only improves the detection accuracy of dense and occluded targets but also lessens model parameters and accelerates detection speed.