Predicting within‐field cotton yields using publicly available datasets and machine learning

Leo, Stephen; Migliorati, Massimiliano De Antoni; Grace, Peter

doi:10.1002/agj2.20543

Cited by 25 publications

(10 citation statements)

References 58 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In GBM, weak learners (variables) are sequentially converted to strong learners to decrease bias from highly correlated variables. In contrast, RF generates models (trees) from a random subset of the training data and averages all trees to reduce the variance (Leo et al., 2021).…”

Section: Discussionmentioning

confidence: 99%

“…The use of machine learning (ML) approaches can capture nonlinear relationships and has shown a strong ability to predict crop yield (Paudel et al., 2022a; Shendryk et al., 2021) when compared with traditional linear approaches (Filippi et al., 2019a; Shahhosseini et al., 2020). Furthermore, among the widely used algorithms, random forests (RF) and gradient boosting machines (GBM) have shown high predictive ability and improved accuracies (Leo et al., 2021). Artificial neural networks are also widely used in predictive modeling and soil mapping (Schillaci et al., 2021).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A machine learning modeling framework forTriticum turgidumsubsp.durumDesf. yield forecasting in Italy

et al. 2023

View full text Add to dashboard Cite

The forecasting of crop yield is one of the most critical research areas in crop science, which allows for the development of decision support systems, optimization of nitrogen fertilization, and food safety. Many tested modeling approaches can be differentiated according to the models and data used. The models used are traditional crop models that require data that are often difficult to measure. New modeling approaches based on artificial intelligence algorithms have proven to be of high performance, flexible, and can be tested based on available data. In this study, four independent field experiments conducted on Triticum turgidum subsp. durum Desf. in central–southern Italy were used to train a set of machine learning (ML) algorithms to predict the yield using 16 variables: fertilization, nitrogen management, pedoclimatic, and remote sensing data. Four ML algorithms were calibrated and validated over two independent sites, and a linear regression model was used as a control. The calibrated models can predict the grain yield in the two regions by using ancillary data, topsoil physical and chemical properties, multispectral drone imagery, climatic data, and nitrogen fertilizer applied at the site. Among the four ML algorithms, stochastic gradient boosting (root‐mean‐square error = 0.58 t ha−1) outperformed others during calibration and transferability. Nitrogen application rate, seasonal precipitation, and temperature are the most important features for predicting wheat yield.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

A machine learning modeling framework forTriticum turgidumsubsp.durumDesf. yield forecasting in Italy

et al. 2023

View full text Add to dashboard Cite

show abstract

“…Although this saturation may be partially contained when selecting the bands to be analysed, the best alternative is to use three-band indices (Verrelst et al, 2013(Verrelst et al, , 2015. To overcome the saturation problem, new VIs have been developed (Fadaei, 2020;Talukdar et al, 2020;Leo et al, 2021). Verrelst et al (2015) evaluated many vegetation indices generated from Sentinel-2 data and found that the best indices matched the three-band indices according to the normalised formula (ρ560-ρ1610-ρ2190) / (ρ560 + ρ1610 + ρ2190).…”

Section: Remote Sensing and Atmospheric Eventsmentioning

confidence: 99%

Assessment of hail damages in maize using remote sensing and comparison with an insurance assessment: A case study in Lombardy

Schillaci¹,

Inverardi²,

Battaglia³

et al. 2022

Ital J Agronomy

View full text Add to dashboard Cite

Studies have shown that the quantification of hail damage is generally inaccurate and is influenced by the experience of the field surveyors/technicians. To overcome this problem, the vegetation indices retrieved by remote sensing, can be used to get information about the hail damage. The aim of this work is the detection of medium-low damages (i.e., between 10 and 30% of the gross saleable production) using the much-used normalized difference vegetation index (NDVI) in comparison with alternative vegetation indices (i.e., ARVI, MCARI, SAVI, MSAVI, MSAVI2) and their change from pre-event to post-event in five hailstorms in Lombardy in 2018. Seventy-four overlapping scenes (10% cloud cover) were collected from the Sentinel-2 in the spring-summer period of 2018 in the Brescia district (Lombardy). An unsupervised classification was carried out to automatically identify the maize fields (grain and silage), testing the change detection approach by searching for damage by hail and strong wind in the Lombardy plain of Brescia. A database of 125 field surveys (average size 4 Ha) after the hailstorm collected from the insurance service allowed for the selection of the dates on which the event occurred and provided a proxy of the extent of the damage (in % of the decrease of the yield). Hail and strong wind damages ranged from 5 to 70%, and they were used for comparison with the satellite image change detection. The differences in the vegetation indices obtained by Sentinel 2 before and after the hailstorm and the insurance assessments of damage after the events were compared to assess the degree of concordance. The modified soil-adjusted vegetation index outperformed other vegetation indices in detecting hail-related damages with the highest accuracy (73.3%). On the other hand, the NDVI resulted in scarce performance ranking last of the six indices, with an accuracy of 65.3%. Future research will evaluate how much uncertainty can be found in the method’s limitations with vegetation indices derived from satellites, how much is due to errors in estimating damage to the ground, and how much is due to other causes. Highlights - The discovery rate of damaged fields improved. - MSAVI outperformed NDVI and other vegetation indices, identifying 73.3% of occurrences. - Estimation of damage from remote sensing was more accurate for fields severely affected >50%. - In low-intensity hail events (<50 canopies affected), the MSAVI provided a detailed picture of the damage across the field. - The proposed approach is promising to develop a ‘sampling map’ for detailed on-ground assessment.

show abstract

“…Leo et al. ( Leo et al., 2021 ) employed RFE within the packing method to choose spectral indices. This led to an improved prediction accuracy for the model when assessing the correlation between predicted and observed scores.…”

Section: Introductionmentioning

confidence: 99%

An integrated feature selection approach to high water stress yield prediction

Li,

Zhou,

Cheng

et al. 2023

Front. Plant Sci.

View full text Add to dashboard Cite

The timely and precise prediction of winter wheat yield plays a critical role in understanding food supply dynamics and ensuring global food security. In recent years, the application of unmanned aerial remote sensing has significantly advanced agricultural yield prediction research. This has led to the emergence of numerous vegetation indices that are sensitive to yield variations. However, not all of these vegetation indices are universally suitable for predicting yields across different environments and crop types. Consequently, the process of feature selection for vegetation index sets becomes essential to enhance the performance of yield prediction models. This study aims to develop an integrated feature selection method known as PCRF-RFE, with a focus on vegetation index feature selection. Initially, building upon prior research, we acquired multispectral images during the flowering and grain filling stages and identified 35 yield-sensitive multispectral indices. We then applied the Pearson correlation coefficient (PC) and random forest importance (RF) methods to select relevant features for the vegetation index set. Feature filtering thresholds were set at 0.53 and 1.9 for the respective methods. The union set of features selected by both methods was used for recursive feature elimination (RFE), ultimately yielding the optimal subset of features for constructing Cubist and Recurrent Neural Network (RNN) yield prediction models. The results of this study demonstrate that the Cubist model, constructed using the optimal subset of features obtained through the integrated feature selection method (PCRF-RFE), consistently outperformed the RNN model. It exhibited the highest accuracy during both the flowering and grain filling stages, surpassing models constructed using all features or subsets derived from a single feature selection method. This confirms the efficacy of the PCRF-RFE method and offers valuable insights and references for future research in the realms of feature selection and yield prediction studies.

show abstract

Predicting within‐field cotton yields using publicly available datasets and machine learning

Cited by 25 publications

References 58 publications

A machine learning modeling framework forTriticum turgidumsubsp.durumDesf. yield forecasting in Italy

A machine learning modeling framework forTriticum turgidumsubsp.durumDesf. yield forecasting in Italy

Assessment of hail damages in maize using remote sensing and comparison with an insurance assessment: A case study in Lombardy

An integrated feature selection approach to high water stress yield prediction

Contact Info

Product

Resources

About