: Pre-harvest fruit yield estimation is useful to guide harvesting and marketing resourcing, but machine vision estimates based on a single view from each side of the tree (“dual-view”) underestimates the fruit yield as fruit can be hidden from view. A method is proposed involving deep learning, Kalman filter, and Hungarian algorithm for on-tree mango fruit detection, tracking, and counting from 10 frame-per-second videos captured of trees from a platform moving along the inter row at 5 km/h. The deep learning based mango fruit detection algorithm, MangoYOLO, was used to detect fruit in each frame. The Hungarian algorithm was used to correlate fruit between neighbouring frames, with the improvement of enabling multiple-to-one assignment. The Kalman filter was used to predict the position of fruit in following frames, to avoid multiple counts of a single fruit that is obscured or otherwise not detected with a frame series. A “borrow” concept was added to the Kalman filter to predict fruit position when its precise prediction model was absent, by borrowing the horizontal and vertical speed from neighbouring fruit. By comparison with human count for a video with 110 frames and 192 (human count) fruit, the method produced 9.9% double counts and 7.3% missing count errors, resulting in around 2.6% over count. In another test, a video (of 1162 frames, with 42 images centred on the tree trunk) was acquired of both sides of a row of 21 trees, for which the harvest fruit count was 3286 (i.e., average of 156 fruit/tree). The trees had thick canopies, such that the proportion of fruit hidden from view from any given perspective was high. The proposed method recorded 2050 fruit (62% of harvest) with a bias corrected Root Mean Square Error (RMSE) = 18.0 fruit/tree while the dual-view image method (also using MangoYOLO) recorded 1322 fruit (40%) with a bias corrected RMSE = 21.7 fruit/tree. The video tracking system is recommended over the dual-view imaging system for mango orchard fruit count.
In field (on tree) fruit sizing has value in assessing crop health and for yield estimation. As the mobile phone is a sensor and communication rich device carried by almost all farm staff, an Android application (“FruitSize”) was developed for measurement of fruit size in field using the phone camera, with a typical assessment rate of 240 fruit per hour achieved. The application was based on imaging of fruit against a backboard with a scale using a mobile phone, with operational limits set on camera to object plane angle and camera to object distance. Image processing and object segmentation techniques available in the OpenCV library were used to segment the fruit from background in images to obtain fruit sizes. Phone camera parameters were accessed to allow calculation of fruit size, with camera to fruit perimeter distance obtained from fruit allometric relationships between fruit thickness and width. Phone geolocation data was also accessed, allowing for mapping fruits of data. Under controlled lighting, RMSEs of 3.4, 3.8, 2.4, and 2.0 mm were achieved in estimation of avocado, mandarin, navel orange, and apple fruit diameter, respectively. For mango fruit, RMSEs of 5.3 and 3.7 mm were achieved on length and width, benchmarked to manual caliper measurements, under controlled lighting, and RMSEs of 5.5 and 4.6 mm were obtained in-field under ambient lighting.
Eight depth cameras varying in operational principle (stereoscopy: ZED, ZED2, OAK-D; IR active stereoscopy: Real Sense D435; time of flight (ToF): Real Sense L515, Kinect v2, Blaze 101, Azure Kinect) were compared in context of use for in-orchard fruit localization and sizing. For this application, a specification on bias-corrected root mean square error of 20 mm for a camera-to-fruit distance of 2 m and operation under sunlit field conditions was set. The ToF cameras achieved the measurement specification, with a recommendation for use of Blaze 101 or Azure Kinect made in terms of operation in sunlight and in orchard conditions. For a camera-to-fruit distance of 1.5 m in sunlight, the Azure Kinect measurement achieved an RMSE of 6 mm, a bias of 17 mm, an SD of 2 mm and a fill rate of 100% for depth values of a central 50 × 50 pixels group. To enable inter-study comparisons, it is recommended that future assessments of depth cameras for this application should include estimation of a bias-corrected RMSE and estimation of bias on estimated camera-to-fruit distances at 50 cm intervals to 3 m, under both artificial light and sunlight, with characterization of image distortion and estimation of fill rate.
Machine vision from ground vehicles is being used for estimation of fruit load on trees, but a correction is required for occlusion by foliage or other fruits. This requires a manually estimated factor (the reference method). It was hypothesised that canopy images could hold information related to the number of occluded fruits. Several image features, such as the proportion of fruit that were partly occluded, were used in training Random forest and multi-layered perceptron (MLP) models for estimation of a correction factor per tree. In another approach, deep learning convolutional neural networks (CNNs) were directly trained against harvest count of fruit per tree. A R2 of 0.98 (n = 98 trees) was achieved for the correlation of fruit count predicted by a Random forest model and the ground truth fruit count, compared to a R2 of 0.68 for the reference method. Error on prediction of whole orchard (880 trees) fruit load compared to packhouse count was 1.6% for the MLP model and 13.6% for the reference method. However, the performance of these models on data of another season was at best equivalent and generally poorer than the reference method. This result indicates that training on one season of data was insufficient for the development of a robust model.
The performance of a multi-view machine vision method was documented at an orchard level, relative to packhouse count. High repeatability was achieved in night-time imaging, with an absolute percentage error of 2% or less. Canopy architecture impacted performance, with reasonable estimates achieved on hedge, single leader and conventional systems (3.4, 5.0, and 8.2 average percentage error, respectively) while fruit load of trellised orchards was over-estimated (at 25.2 average percentage error). Yield estimations were made for multiple orchards via: (i) human count of fruit load on ~5% of trees (FARM), (ii) human count of 18 trees randomly selected within three NDVI stratifications (CAL), (iii) multi-view counts (MV-Raw) and (iv) multi-view corrected for occluded fruit using manual counts of CAL trees (MV-CAL). Across the nine orchards for which results for all methods were available, the FARM, CAL, MV-Raw and MV-CAL methods achieved an average percentage error on packhouse counts of 26, 13, 11 and 17%, with SD of 11, 8, 11 and 9%, respectively, in the 2019–2020 season. The absolute percentage error of the MV-Raw estimates was 10% or less in 15 of the 20 orchards assessed. Greater error in load estimation occurred in the 2020–2021 season due to the time-spread of flowering. Use cases for the tree level data on fruit load was explored in context of fruit load density maps to inform early harvesting and to interpret crop damage, and tree frequency distributions based on fruit load per tree.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.