The precision detection of dense small targets in orchards is critical for the visual perception of agricultural picking robots. At present, the visual detection algorithms for plums still have a poor recognition effect due to the characteristics of small plum shapes and dense growth. Thus, this paper proposed a lightweight model based on the improved You Only Look Once version 4 (YOLOv4) to detect dense plums in orchards. First, we employed a data augmentation method based on category balance to alleviate the imbalance in the number of plums of different maturity levels and insufficient data quantity. Second, we abandoned Center and Scale Prediction Darknet53 (CSPDarknet53) and chose a lighter MobilenetV3 on selecting backbone feature extraction networks. In the feature fusion stage, we used depthwise separable convolution (DSC) instead of standard convolution to achieve the purpose of reducing model parameters. To solve the insufficient feature extraction problem of dense targets, this model achieved fine-grained detection by introducing a 152 × 152 feature layer. The Focal loss and complete intersection over union (CIOU) loss were joined to balance the contribution of hard-to-classify and easy-to-classify samples to the total loss. Then, the improved model was trained through transfer learning at different stages. Finally, several groups of detection experiments were designed to evaluate the performance of the improved model. The results showed that the improved YOLOv4 model had the best mean average precision (mAP) performance than YOLOv4, YOLOv4-tiny, and MobileNet-Single Shot Multibox Detector (MobileNet-SSD). Compared with some results from the YOLOv4 model, the model size of the improved model is compressed by 77.85%, the parameters are only 17.92% of the original model parameters, and the detection speed is accelerated by 112%. In addition, the influence of the automatic data balance algorithm on the accuracy of the model and the detection effect of the improved model under different illumination angles, different intensity levels, and different types of occlusions were discussed in this paper. It is indicated that the improved detection model has strong robustness and high accuracy under the real natural environment, which can provide data reference for the subsequent orchard yield estimation and engineering applications of robot picking work.
The utilization of unmanned aerial vehicles (UAVs) for the precise and convenient detection of litchi fruits, in order to estimate yields and perform statistical analysis, holds significant value in the complex and variable litchi orchard environment. Currently, litchi yield estimation relies predominantly on manual rough counts, which often result in discrepancies between the estimated values and the actual production figures. This study proposes a large-scene and high-density litchi fruit recognition method based on the improved You Only Look Once version 5 (YOLOv5) model. The main objective is to enhance the accuracy and efficiency of yield estimation in natural orchards. First, the PANet in the original YOLOv5 model is replaced with the improved Bi-directional Feature Pyramid Network (BiFPN) to enhance the model’s cross-scale feature fusion. Second, the P2 feature layer is fused into the BiFPN to enhance the learning capability of the model for high-resolution features. After that, the Normalized Gaussian Wasserstein Distance (NWD) metric is introduced into the regression loss function to enhance the learning ability of the model for litchi tiny targets. Finally, the Slicing Aided Hyper Inference (SAHI) is used to enhance the detection of tiny targets without increasing the model’s parameters or computational memory. The experimental results show that the overall AP value of the improved YOLOv5 model has been effectively increased by 22%, compared to the original YOLOv5 model’s AP value of 50.6%. Specifically, the APs value for detecting small targets has increased from 27.8% to 57.3%. The model size is only 3.6% larger than the original YOLOv5 model. Through ablation and comparative experiments, our method has successfully improved accuracy without compromising the model size and inference speed. Therefore, the proposed method in this paper holds practical applicability for detecting litchi fruits in orchards. It can serve as a valuable tool for providing guidance and suggestions for litchi yield estimation and subsequent harvesting processes. In future research, optimization can be continued for the small target detection problem, while it can be extended to study the small target tracking problem in dense scenarios, which is of great significance for litchi yield estimation.
The fast and precise detection of dense litchi fruits and the determination of their maturity is of great practical significance for yield estimation in litchi orchards and robot harvesting. Factors such as complex growth environment, dense distribution, and random occlusion by leaves, branches, and other litchi fruits easily cause the predicted output based on computer vision deviate from the actual value. This study proposed a fast and precise litchi fruit detection method and application software based on an improved You Only Look Once version 5 (YOLOv5) model, which can be used for the detection and yield estimation of litchi in orchards. First, a dataset of litchi with different maturity levels was established. Second, the YOLOv5s model was chosen as a base version of the improved model. ShuffleNet v2 was used as the improved backbone network, and then the backbone network was fine-tuned to simplify the model structure. In the feature fusion stage, the CBAM module was introduced to further refine litchi’s effective feature information. Considering the characteristics of the small size of dense litchi fruits, the 1,280 × 1,280 was used as the improved model input size while we optimized the network structure. To evaluate the performance of the proposed method, we performed ablation experiments and compared it with other models on the test set. The results showed that the improved model’s mean average precision (mAP) presented a 3.5% improvement and 62.77% compression in model size compared with the original model. The improved model size is 5.1 MB, and the frame per second (FPS) is 78.13 frames/s at a confidence of 0.5. The model performs well in precision and robustness in different scenarios. In addition, we developed an Android application for litchi counting and yield estimation based on the improved model. It is known from the experiment that the correlation coefficient R2 between the application test and the actual results was 0.9879. In summary, our improved method achieves high precision, lightweight, and fast detection performance at large scales. The method can provide technical means for portable yield estimation and visual recognition of litchi harvesting robots.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.