Background The potential to harness the plurality of available data in real time along with advanced data analytics for the accurate prediction of influenza-like illness (ILI) outbreaks has gained significant scientific interest. Different methodologies based on the use of machine learning techniques and traditional and alternative data sources, such as ILI surveillance reports, weather reports, search engine queries, and social media, have been explored with the ultimate goal of being used in the development of electronic surveillance systems that could complement existing monitoring resources. Objective The scope of this study was to investigate for the first time the combined use of ILI surveillance data, weather data, and Twitter data along with deep learning techniques toward the development of prediction models able to nowcast and forecast weekly ILI cases. By assessing the predictive power of both traditional and alternative data sources on the use case of ILI, this study aimed to provide a novel approach for corroborating evidence and enhancing accuracy and reliability in the surveillance of infectious diseases. Methods The model’s input space consisted of information related to weekly ILI surveillance, web-based social (eg, Twitter) behavior, and weather conditions. For the design and development of the model, relevant data corresponding to the period of 2010 to 2019 and focusing on the Greek population and weather were collected. Long short-term memory (LSTM) neural networks were leveraged to efficiently handle the sequential and nonlinear nature of the multitude of collected data. The 3 data categories were first used separately for training 3 LSTM-based primary models. Subsequently, different transfer learning (TL) approaches were explored with the aim of creating various feature spaces combining the features extracted from the corresponding primary models’ LSTM layers for the latter to feed a dense layer. Results The primary model that learned from weather data yielded better forecast accuracy (root mean square error [RMSE]=0.144; Pearson correlation coefficient [PCC]=0.801) than the model trained with ILI historical data (RMSE=0.159; PCC=0.794). The best performance was achieved by the TL-based model leveraging the combination of the 3 data categories (RMSE=0.128; PCC=0.822). Conclusions The superiority of the TL-based model, which considers Twitter data, weather data, and ILI surveillance data, reflects the potential of alternative public sources to enhance accurate and reliable prediction of ILI spread. Despite its focus on the use case of Greece, the proposed approach can be generalized to other locations, populations, and social media platforms to support the surveillance of infectious diseases with the ultimate goal of reinforcing preparedness for future epidemics.
BACKGROUND The potential of harnessing the plurality of available data in real time along with advanced data analytics towards the accurate prediction of influenza-like-illness (ILI) outbreaks has gained significant scientific interest. Different methodologies based on the use of machine learning techniques and traditional and alternative data sources such as ILI surveillance reports, weather reports, search engine queries, and social media, have been explored with the ultimate goal to be utilized in the development of electronic surveillance systems that could complement existing monitoring resources. OBJECTIVE The aim of the present study is to investigate for the first time the combined use of ILI surveillance data, weather data, and Twitter data, along with deep learning techniques towards the development of prediction models able to nowcast and forecast weekly ILI cases. METHODS The model’s input space consists of information related to weekly ILI surveillance, online social (e.g., Twitter) behavior, and weather conditions. For the design and development of the model, relevant data corresponding to the period 2010-2019 and focusing on the Greek population and weather have been collected. Long short term memory neural networks (LSTMs) are leveraged to efficiently handle the sequential and nonlinear nature of the multitude of collected data. The three data categories are firstly utilized separately for training three LSTM-based primary models. Subsequently, different transfer learning (TL) approaches are explored with the aim of creating various feature spaces combining the features extracted from the corresponding primary models’ LSTM layers in order for the latter to feed a dense layer. RESULTS The primary model which learns from weather data yields better forecast accuracy (root mean square error - RMSE = 0.144, pearson correlation coefficient - PCC= 0.801) than the model which is trained with ILI historical data (RMSE = 0.159, PCC= 0.794). The best performance is achieved by the TL-based model leveraging the combination of the three data categories (RMSE = 0.128, PCC = 0.822). CONCLUSIONS The superiority of the TL-based model which takes into account Twitter data, weather data, and ILI surveillance data reflects the potential of alternative public sources to enhance accurate and reliable prediction of the ILI spread.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.