In the complex environment of greenhouses, it is important to provide the picking robot with accurate information. For this purpose, this paper improves the recognition and detection method based on you only look once v5 (YOLO v5). Firstly, adding data enhancement boosts the network generalizability. On the input end, the k-means clustering (KMC) was utilized to obtain more suitable anchors, aiming to increase detection accuracy. Secondly, it enhanced multi-scale feature extraction by improving the spatial pyramid pooling (SPP). Finally, non-maximum suppression (NMS) was optimized to improve the accuracy of the network. Experimental results show that the improved YOLO v5 achieved a mean average precision (mAP) of 97.3%, a recall of 90.5%, and an F1-score of 92.0%, while the original YOLO v5 had a mAP of 95.9% and a recall of 85.6%; the improved YOLO v5 took 57ms to identify and detect each image. The recognition accuracy and speed of the improved YOLOv5 are much better than those of faster region-based convolutional neural network (Faster R-CNN) and YOLO v3. After that, the improved network was applied to identify and detect images take in unstructured environments with different illumination, branch/leave occlusions, and overlapping fruits. The results show that the improved network has a good robustness, providing stable and reliable information for the operation of tomato picking robots.
This paper proposes a method combining binocular vision and deep learning to identify and locate ripe tomatoes in greenhouses. First, the CBAM attention mechanism module is added to the YOLO V3 model to improve the robustness of the YOLO V3 model to the greenhouse environment, and then the tomato results identified by the improved YOLOV3 CBAM are fused with the three-dimensional information obtained by the binocular stereo camera. to obtain the threedimensional position information of the tomato fruit. After testing, the model has an accuracy of 89.15% for tomato recognition, the AP is 86.17%, and the F1 value is 82%. The relative error of the tomato fruit positioning is less than 1.5%. Finally, the model was arranged in the greenhouse to test the tomato picking robot, which verifies the practicability of the method.
In the process of picking tomatoes, due to the mechanical error caused by the mechanical arm, the tomato positions cannot be detected accurately, and the information feedback of the positioning is not available, affecting the picking efficiency. Therefore, this article proposed visual feedback information and correction, and designed an improved yolov5s model lightweight detection method, whose backbone network was replaced with lightweight ShuffleNetV2. In addition, the Bidirectional Feature Pyramid Network (BiFPN) was added to obtain richer feature information. Experimental results showed that the improved model achieved 97.4 percent mAP, 97.5 percent accuracy and 1.89 MB model size, with inference time of 4.8 ms per image. This detection method quickly calculated the Euclidean distance between the reference point and the target tomato. The target tomato, with the Euclidean distance less than 58.12 mm, was picked successfully, while the one, with the Euclidean distance greater than 58.12 mm, was not picked. Then the error needs to be calculated and fed back to the robot for picking again. The whole process realized information feedback and correction and improved the picking efficiency with less feedback time.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.