The Horn of Africa is highly vulnerable to droughts and floods, and reliable long-term forecasting is a key part of building resilience. However, the prediction of the “long rains” season (March–May) is particularly challenging for dynamical climate prediction models. Meanwhile, the potential for machine learning to improve seasonal precipitation forecasts in the region has yet to be uncovered. Here, we implement and evaluate four data-driven models for prediction of long rains rainfall: ridge and lasso linear regressions, random forests and a single-layer neural network. Predictors are based on SSTs, zonal winds, land state, and climate indices, and the target variables are precipitation totals for each separate month (March, April, and May) in the Horn of Africa drylands, with separate predictions made for lead-times of 1–3 months. Results reveal a tendency for overfitting when predictors are preselected based on correlations to the target variable over the entire historical period, a frequent practice in machine learning-based seasonal forecasting. Using this conventional approach, the data-driven methods—and particularly the lasso and ridge regressions—often outperform dynamical seasonal hindcasts. However, when the selection of predictors is done independently of both the train and test data, by performing this predictor selection within the cross-validation loop, the performance of all four data-driven models is poorer than that of the dynamical hindcasts. These findings should not discourage future applications of machine learning for rainfall forecasting in the region. Yet, they should be seen as a note of caution to prevent optimistically biased results that are not indicative of the true power in operational forecast systems.
<p>The Horn of Africa is known to be prone to climate impacts; the frequent occurrence of droughts and floods creates vulnerable conditions in the region. Gaining knowledge on (sub-)seasonal weather prediction and generating more reliable long-term forecasts is an important asset in building resilience. Most of the region is characterized by a bimodal precipitation cycle with rainfall seasons in boreal spring (March&#8211;May), termed the long rains, and boreal autumn (October&#8211;November), termed the short rains. Previous studies on seasonal forecasting focused mostly on empirical linear regression methods using information from ocean&#8211;atmosphere modes. To date, the potential of more complex methods, such as machine learning approaches, in improving seasonal precipitation predictability in the Horn of Africa still remains understudied.&#160;</p><p>&#160;</p><p>In this study, machine learning models targeting precipitation during the long rains are developed. The focus on the long rains is motivated by the fact that it is the main rain season in the region and the sources of predictability have proven to be more difficult to pin down. The long rain season has a weak internal coherence and looking at the months separately has proven to enhance prediction skill. Therefore, machine learning models are constructed for the different months (March, April, and May) separately at lead times of 1&#8211;3 months. Following an extensive survey of literature, the predictors of the long rain precipitation at seasonal timescales selected in this study include coupled oceanic-atmospheric oscillation indices (such as MJO, ENSO and PDO), regions of zonal winds over 200mb and 850mb and sea-surface temperature (SST) regions with strong correlation to long rain precipitation. Further, a selection of additional terrestrial and oceanic predictors is guided by Lagrangian transport modeling, used to identify the regions sourcing moisture during the long rains. This set of predictors include soil moisture, land surface temperature, normalized vegetation index (NDVI), leaf area index (LAI) and SST, which are averaged over the climatological source region of long rain precipitation. Finally, we provide new insights into the predictability of long rain precipitation at seasonal timescales by analyzing the relative importance of the different predictors used for developing the machine learning model.</p>
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.