The accurate identification of citrus fruits is important for fruit yield estimation in complex citrus orchards. In this study, the YOLOv7-tiny-BVP network is constructed based on the YOLOv7-tiny network, with citrus fruits as the research object. This network introduces a BiFormer bilevel routing attention mechanism, which replaces regular convolution with GSConv, adds the VoVGSCSP module to the neck network, and replaces the simplified efficient layer aggregation network (ELAN) with partial convolution (PConv) in the backbone network. The improved model significantly reduces the number of model parameters and the model inference time, while maintaining the network’s high recognition rate for citrus fruits. The results showed that the fruit recognition accuracy of the modified model was 97.9% on the test dataset. Compared with the YOLOv7-tiny, the number of parameters and the size of the improved network were reduced by 38.47% and 4.6 MB, respectively. Moreover, the recognition accuracy, frames per second (FPS), and F1 score improved by 0.9, 2.02, and 1%, respectively. The network model proposed in this paper has an accuracy of 97.9% even after the parameters are reduced by 38.47%, and the model size is only 7.7 MB, which provides a new idea for the development of a lightweight target detection model.