Background
Since the beginning of the COVID-19 pandemic in late 2019, its far-reaching impacts have been witnessed globally across all aspects of human life, such as health, economy, politics, and education. Such widely penetrating impacts cast significant and profound burdens on all population groups, incurring varied concerns and sentiments among them.
Objective
This study aims to identify the concerns, sentiments, and disparities of various population groups during the COVID-19 pandemic through a cross-sectional study conducted via large-scale Twitter data mining infoveillance.
Methods
This study consisted of three steps: first, tweets posted during the pandemic were collected and preprocessed on a large scale; second, the key population attributes, concerns, sentiments, and emotions were extracted via a collection of natural language processing procedures; third, multiple analyses were conducted to reveal concerns, sentiments, and disparities among population groups during the pandemic. Overall, this study implemented a quick, effective, and economical approach for analyzing population-level disparities during a public health event. The source code developed in this study was released for free public use at GitHub.
Results
A total of 1,015,655 original English tweets posted from August 7 to 12, 2020, were acquired and analyzed to obtain the following results. Organizations were significantly more concerned about COVID-19 (odds ratio [OR] 3.48, 95% CI 3.39-3.58) and expressed more fear and depression emotions than individuals. Females were less concerned about COVID-19 (OR 0.73, 95% CI 0.71-0.75) and expressed less fear and depression emotions than males. Among all age groups (ie, ≤18, 19-29, 30-39, and ≥40 years of age), the attention ORs of COVID-19 fear and depression increased significantly with age. It is worth noting that not all females paid less attention to COVID-19 than males. In the age group of 40 years or older, females were more concerned than males, especially regarding the economic and education topics. In addition, males 40 years or older and 18 years or younger were the least positive. Lastly, in all sentiment analyses, the sentiment polarities regarding political topics were always the lowest among the five topics of concern across all population groups.
Conclusions
Through large-scale Twitter data mining, this study revealed that meaningful differences regarding concerns and sentiments about COVID-19-related topics existed among population groups during the study period. Therefore, specialized and varied attention and support are needed for different population groups. In addition, the efficient analysis method implemented by our publicly released code can be utilized to dynamically track the evolution of each population group during the pandemic or any other major event for better informed public health research and interventions.