Background
Mainland China, the world’s most populous region, experienced a large-scale coronavirus disease 2019 (COVID-19) outbreak in 2020 and 2021, respectively. Existing infodemiology studies have primarily concentrated on the prospective surveillance of confirmed cases or symptoms which met the criterion for investigators; nevertheless, the actual impact regarding COVID-19 on the public and subsequent attitudes of different groups towards the COVID-19 epidemic were neglected.
Methods
This study aimed to examine the public web-based search trends and behavior patterns related to COVID-19 outbreaks in mainland China by using hot words and Baidu Index (BI). The initial hot words (the high-frequency words on the Internet) and the epidemic data (2019/12/01–2021/11/30) were mined from infodemiology platforms. The final hot words table was established by two-rounds of hot words screening and double-level hot words classification. Temporal distribution and demographic portraits of COVID-19 were queried by search trends service supplied from BI to perform the correlation analysis. Further, we used the parameter estimation to quantitatively forecast the geographical distribution of COVID-19 in the future.
Results
The final English-Chinese bilingual table was established including six domains and 32 subordinate hot words. According to the temporal distribution of domains and subordinate hot words in 2020 and 2021, the peaks of searching subordinate hot words and COVID-19 outbreak periods had significant temporal correlation and the subordinate hot words in COVID-19 Related and Territory domains were reliable for COVID-19 surveillance. Gender distribution results showed that Territory domain (the male proportion: 67.69%; standard deviation (SD): 5.88%) and Symptoms/Symptom and Public Health (the female proportion: 57.95%, 56.61%; SD: 0, 9.06%) domains were searched more by male and female groups respectively. The results of age distribution of hot words showed that people aged 20–50 (middle-aged people) had a higher online search intensity, and the group of 20–29, 30–39 years old focused more on Media and Symptoms/Symptom (proportion: 45.43%, 51.66%; SD: 15.37%, 16.59%) domains respectively. Finally, based on frequency rankings of searching hot words and confirmed cases in Mainland China, the epidemic situation of provinces and Chinese administrative divisions were divided into 5 levels of early-warning regions. Central, East and South China regions would be impacted again by the COVID-19 in the future.