Background
The COVID-19 pandemic has prevailed over a year, and log and register data on coronavirus have been utilized to establish models for detecting the pandemic. However, many sources contain unreliable health information on COVID-19 and its symptoms, and platforms cannot characterize the users performing searches. Prior studies have assessed symptom searches from general search engines (Google/Google Trends). Little is known about how modeling log data on smell/taste disorders and coronavirus from the dedicated internet databases used by citizens and health care professionals (HCPs) could enhance disease surveillance. Our material and method provide a novel approach to analyze web-based information seeking to detect infectious disease outbreaks.
Objective
The aim of this study was (1) to assess whether citizens’ and professionals’ searches for smell/taste disorders and coronavirus relate to epidemiological data on COVID-19 cases, and (2) to test our negative binomial regression modeling (ie, whether the inclusion of the case count could improve the model).
Methods
We collected weekly log data on searches related to COVID-19 (smell/taste disorders, coronavirus) between December 30, 2019, and November 30, 2020 (49 weeks). Two major medical internet databases in Finland were used: Health Library (HL), a free portal aimed at citizens, and Physician’s Database (PD), a database widely used among HCPs. Log data from databases were combined with register data on the numbers of COVID-19 cases reported in the Finnish National Infectious Diseases Register. We used negative binomial regression modeling to assess whether the case numbers could explain some of the dynamics of searches when plotting database logs.
Results
We found that coronavirus searches drastically increased in HL (0 to 744,113) and PD (4 to 5375) prior to the first wave of COVID-19 cases between December 2019 and March 2020. Searches for smell disorders in HL doubled from the end of December 2019 to the end of March 2020 (2148 to 4195), and searches for taste disorders in HL increased from mid-May to the end of November (0 to 1980). Case numbers were significantly associated with smell disorders (P<.001) and taste disorders (P<.001) in HL, and with coronavirus searches (P<.001) in PD. We could not identify any other associations between case numbers and searches in either database.
Conclusions
Novel infodemiological approaches could be used in analyzing database logs. Modeling log data from web-based sources was seen to improve the model only occasionally. However, search behaviors among citizens and professionals could be used as a supplementary source of information for infectious disease surveillance. Further research is needed to apply statistical models to log data of the dedicated medical databases.