Background The COVID-19 outbreak has left many people isolated within their homes; these people are turning to social media for news and social connection, which leaves them vulnerable to believing and sharing misinformation. Health-related misinformation threatens adherence to public health messaging, and monitoring its spread on social media is critical to understanding the evolution of ideas that have potentially negative public health impacts. Objective The aim of this study is to use Twitter data to explore methods to characterize and classify four COVID-19 conspiracy theories and to provide context for each of these conspiracy theories through the first 5 months of the pandemic. Methods We began with a corpus of COVID-19 tweets (approximately 120 million) spanning late January to early May 2020. We first filtered tweets using regular expressions (n=1.8 million) and used random forest classification models to identify tweets related to four conspiracy theories. Our classified data sets were then used in downstream sentiment analysis and dynamic topic modeling to characterize the linguistic features of COVID-19 conspiracy theories as they evolve over time. Results Analysis using model-labeled data was beneficial for increasing the proportion of data matching misinformation indicators. Random forest classifier metrics varied across the four conspiracy theories considered (F1 scores between 0.347 and 0.857); this performance increased as the given conspiracy theory was more narrowly defined. We showed that misinformation tweets demonstrate more negative sentiment when compared to nonmisinformation tweets and that theories evolve over time, incorporating details from unrelated conspiracy theories as well as real-world events. Conclusions Although we focus here on health-related misinformation, this combination of approaches is not specific to public health and is valuable for characterizing misinformation in general, which is an important first step in creating targeted messaging to counteract its spread. Initial messaging should aim to preempt generalized misinformation before it becomes widespread, while later messaging will need to target evolving conspiracy theories and the new facets of each as they become incorporated.
Background Health authorities can minimize the impact of an emergent infectious disease outbreak through effective and timely risk communication, which can build trust and adherence to subsequent behavioral messaging. Monitoring the psychological impacts of an outbreak, as well as public adherence to such messaging, is also important for minimizing long-term effects of an outbreak. Objective We used social media data from Twitter to identify human behaviors relevant to COVID-19 transmission, as well as the perceived impacts of COVID-19 on individuals, as a first step toward real-time monitoring of public perceptions to inform public health communications. Methods We developed a coding schema for 6 categories and 11 subcategories, which included both a wide number of behaviors as well codes focused on the impacts of the pandemic (eg, economic and mental health impacts). We used this to develop training data and develop supervised learning classifiers for classes with sufficient labels. Classifiers that performed adequately were applied to our remaining corpus, and temporal and geospatial trends were assessed. We compared the classified patterns to ground truth mobility data and actual COVID-19 confirmed cases to assess the signal achieved here. Results We applied our labeling schema to approximately 7200 tweets. The worst-performing classifiers had F1 scores of only 0.18 to 0.28 when trying to identify tweets about monitoring symptoms and testing. Classifiers about social distancing, however, were much stronger, with F1 scores of 0.64 to 0.66. We applied the social distancing classifiers to over 228 million tweets. We showed temporal patterns consistent with real-world events, and we showed correlations of up to –0.5 between social distancing signals on Twitter and ground truth mobility throughout the United States. Conclusions Behaviors discussed on Twitter are exceptionally varied. Twitter can provide useful information for parameterizing models that incorporate human behavior, as well as for informing public health communication strategies by describing awareness of and compliance with suggested behaviors.
BACKGROUND Health authorities can minimize the impact of an emergent infectious disease outbreak through effective and timely risk communication, which can build trust and adherence to subsequent behavioral messaging. Monitoring the psychological impacts of an outbreak, as well as public adherence to such messaging is also important for minimizing long term effects of an outbreak. OBJECTIVE We used social media data to identify human behaviors relevant to COVID-19 transmission and the perceived impacts of COVID-19 on individuals as a first step toward real time monitoring of public perceptions to inform public health communications. METHODS We develop a coding schema for 6 categories and 11 subcategories, which includes both a wide number of behaviors, as well codes focused on the impacts of the pandemic (e.g., economic and mental health impacts). We use this to develop training data and develop supervised learning classifiers for classes with sufficient labels. Classifiers that perform adequately are applied to our remaining corpus and temporal and geospatial trends are assessed. We compare the classified patterns to ground truth mobility data and actual COVID-19 confirmed cases to assess the signal achieved here. RESULTS We apply our labeling schema to ~7200 tweets. The worst performing classifiers have F1 scores of only 0.18-0.28 when trying to identify tweets about monitoring symptoms and testing. Classifiers about social distancing, however, are much stronger with F1 scores of 0.64-0.66. We applied the social distancing classifiers to over 228 million tweets. We show temporal patterns consistent with real-world events, and show correlations of up to -0.5 between social distancing signals on Twitter and ground-truth mobility throughout the United States. CONCLUSIONS Behaviors discussed on Twitter are exceptionally varied. Twitter can provide useful information for parameterizing models that incorporate human behavior as well as informing public health communication strategies by describing awareness of and compliance with suggested behaviors. CLINICALTRIAL N/A
BACKGROUND Misinformation spread through social media is a growing problem, and the emergence of COVID-19 has caused an explosion in new activity and renewed focus on the resulting threat to public health. Given this increased visibility, in-depth analysis of COVID-19 misinformation spread is critical to understanding the evolution of ideas with potential negative public health impact. OBJECTIVE We use Twitter data to explore methods for characterization and classification of major COVID-19 myths and conspiracy theories, and to provide context for the theories’ evolution through the pandemic’s early months. METHODS Using a curated data set of COVID-19 tweets (N ~ 120 million tweets) spanning late January to early May 2020, we applied methods including regular expression filtering, supervised machine learning, sentiment analysis, geospatial analysis, and dynamic topic modeling to trace the spread of misinformation and to characterize novel features of COVID-19 conspiracy theories. RESULTS Random forest models for four major misinformation topics provided mixed results, with narrowly-defined conspiracy theories achieving F1 scores of 0.804 and 0.857, while more broad theories performed measurably worse, with scores of 0.654 and 0.347. Despite this, analysis using model-labeled data was beneficial for increasing the proportion of data matching misinformation indicators. We were able to identify distinct increases in negative sentiment, theory-specific trends in geospatial spread, and the evolution of conspiracy theory topics and subtopics over time. CONCLUSIONS COVID-19 related conspiracy theories show that history frequently repeats itself, with the same conspiracy theories being recycled for new situations. We use a combination of supervised learning, unsupervised learning, and natural language processing techniques to look at the evolution of theories over the first four months of the COVID-19 outbreak, how these theories intertwine, and to hypothesize on more effective public health messaging to combat misinformation in online spaces. CLINICALTRIAL N/A
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.