Crop yield prediction (CYP) is a major problem in agriculture. Starting each growing season, agricultural planners require estimating the yield for all the involved crops (Frausto-Solis et al., 2009). Regrettably, CYP is difficult because it depends on many interrelated factors (Liu et al., 2001;Marinković et al., 2009). Moreover, yield is also affected by farmer decisions (such as applied irrigations, pest and fertilizers applications, crop rotation, and land preparation) and incontrollable factors (such as weather, subsidies and market). As stated by Ruß (2009), yield prediction traditionally has relied on farmers' long-term experience for specific fields, crops and climate conditions, which can be inaccurate. Simple estimators, such as the average of several previous yields or the last obtained yield, are also used. Nevertheless, crop yield varies spatially and temporally with a non-linear behavior (Liu et al., 2001;Drummond et al., 2003;Schlenker & Roberts, 2006)
AbstractAn important issue for agricultural planning purposes is the accurate yield estimation for the numerous crops involved in the planning. Machine learning (ML) is an essential approach for achieving practical and effective solutions for this problem. Many comparisons of ML methods for yield prediction have been made, seeking for the most accurate technique. Generally, the number of evaluated crops and techniques is too low and does not provide enough information for agricultural planning purposes. This paper compares the predictive accuracy of ML and linear regression techniques for crop yield prediction in ten crop datasets. Multiple linear regression, M5-Prime regression trees, perceptron multilayer neural networks, support vector regression and k-nearest neighbor methods were ranked. Four accuracy metrics were used to validate the models: the root mean square error (RMS), root relative square error (RRSE), normalized mean absolute error (MAE), and correlation factor (R). Real data of an irrigation zone of Mexico were used for building the models. Models were tested with samples of two consecutive years. The results show that M5-Prime and k-nearest neighbor techniques obtain the lowest average RMSE errors (5.14 and 4.91), the lowest RRSE errors (79.46% and 79.78%), the lowest average MAE errors (18.12% and 19.42%), and the highest average correlation factors (0.41 and 0.42). Since M5-Prime achieves the largest number of crop yield models with the lowest errors, it is a very suitable tool for massive crop yield prediction in agricultural planning.