Background
Public health surveillance is based on the continuous and systematic collection, analysis, and interpretation of data. This informs the development of early warning systems to monitor epidemics and documents the impact of intervention measures. The introduction of digital data sources, and specifically sources available on the internet, has impacted the field of public health surveillance. New opportunities enabled by the underlying availability and scale of internet-based sources (IBSs) have paved the way for novel approaches for disease surveillance, exploration of health communities, and the study of epidemic dynamics. This field and approach is also known as infodemiology or infoveillance.
Objective
This review aimed to assess research findings regarding the application of IBSs for public health surveillance (infodemiology or infoveillance). To achieve this, we have presented a comprehensive systematic literature review with a focus on these sources and their limitations, the diseases targeted, and commonly applied methods.
Methods
A systematic literature review was conducted targeting publications between 2012 and 2018 that leveraged IBSs for public health surveillance, outbreak forecasting, disease characterization, diagnosis prediction, content analysis, and health-topic identification. The search results were filtered according to previously defined inclusion and exclusion criteria.
Results
Spanning a total of 162 publications, we determined infectious diseases to be the preferred case study (108/162, 66.7%). Of the eight categories of IBSs (search queries, social media, news, discussion forums, websites, web encyclopedia, and online obituaries), search queries and social media were applied in 95.1% (154/162) of the reviewed publications. We also identified limitations in representativeness and biased user age groups, as well as high susceptibility to media events by search queries, social media, and web encyclopedias.
Conclusions
IBSs are a valuable proxy to study illnesses affecting the general population; however, it is important to characterize which diseases are best suited for the available sources; the literature shows that the level of engagement among online platforms can be a potential indicator. There is a necessity to understand the population’s online behavior; in addition, the exploration of health information dissemination and its content is significantly unexplored. With this information, we can understand how the population communicates about illnesses online and, in the process, benefit public health.