New and simple crop yield prediction methods are expected to be developed owing to the increasing environmental stress caused by climate change. Algorithms of machine learning could be a powerful tool for predicting crop yield; however, the required feature variables and differences in their prediction accuracy are poorly addressed. The objectives of this study were to identify the best combination of feature variables to predict the yield of cowpea (
Vigna unguiculata
), which is widely grown in central Sudan Savanna under environmentally restricted conditions, and clarify the differences in the accuracy of major machine learning algorithms. The study also explored the environmental and plant factors affecting the prediction errors. Sample data were obtained from cowpea field experiments in central Sudan Savanna. The prediction was performed using 28 models, encompassing four machine learning algorithms and seven combinations of feature variables. Support Vector Regression and Neural Network algorithms effectively predicted cowpea yields using continuous leaf coverage rates as feature variables; however, some differences were observed in their prediction accuracy depending on the soil types and growth habits. The use of feature variables that are related to shoot growth and plant physiological status could minimize prediction errors.
Supplementary Information
The online version contains supplementary material available at 10.1038/s41598-024-80288-3.