Influenza-like illnesses (ILI) result in deaths and hospitalizations across the globe. Traditional surveillance systems rely on data from general medical practitioners. The process is resource-intensive and plagued with delay. Although recent studies have shown the potential utility of free and fast alternatives like web and social media data, the reliability cannot be generalized due to differences in technological culture. Meanwhile, there is a scarcity of studies exploring these free online data for (sub-Saharan) African countries. In this paper, we utilize Google trends (GT) data for ILI forecasting in South Africa. We study models based on deep learning (Long short-term memory (LSTM) and feedforward neural networks (FNN)), machine learning (Multiple linear regression (MLR), elastic net (EN), support vector machine (SVM)), and statistical time series (seasonal autoregressive integrated moving average (SARIMA)) algorithms. The FNN and SVM models using GT data alone, produce forecasts close in accuracy to those fitted to actual ILI data. The algorithms rank differently across various performance measures. Generally, the deep learning techniques perform better than the other algorithms in our study. However, tuning the former is quite intricate. Combining GT and historical ILI data enhances the models. The non-deep-learning algorithms benefit more from this enhancement. Furthermore, we observe that search volume increases proportional to and timeously with reported infection rates, suggesting that South Africans search Google in the week they feel flu symptoms. Thus, monitoring Google search trends is a reliable proxy for monitoring flu spread in South Africa.
This paper investigates the usefulness of Google search patterns with Artificial Intelligence (AI) techniques for timely influenza-like illness (ILI) forecasting for each of the nine South African provinces. Traditional surveillance methods are limited by delays in reporting. Existing digital disease surveillance studies that employ alternative online data have scarcely explored sub-Saharan African countries. In South Africa, Google search data has only been recently studied for ILI surveillance at the national level. Meanwhile, the differences in socio-economic and technological conditions across provinces call for a finer spatial investigation. We perform correlation analysis between Google trends (GT) data for 21 ILI-related terms and real-life ILI surveillance data for each province. Next, we develop models to assess the predictive performance of these GT data for forecasting ILI rates, using time series, machine learning, and deep learning methods. We observe sufficient correlation for only two of the nine provinces: Gauteng and Western Cape. Thus, GT data could only be used to forecast ILI in these two provinces. Interestingly, these two provinces are regarded as the most economically developed. In the other seven provinces, LSTM, a deep learning technique, gives more accurate predictions than a baseline autoregressive model when only past ILI data are used for forecasting future ILI trends. The results reveal that, for provinces for which GT data is sufficiently available, it is not only free and fast, but is an effective predictor on its own as well as when added to past ILI data for forecasting future ILI infection rates. The correlation analysis suggests an association between provincial socio-economic development and the use of digital platforms for disease surveillance. Overall, the study established the need for finer scale ILI forecasting which will inform targeted planning for disease surveillance and interventions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.