Self-reported COVID-19 symptoms on Twitter: an analysis and a research resource

Sarker, Abeed; Lakamana, Sahithi; Hogg-Bremer, Whitney; Xie, Angel; Al-Garadi, Mohammed Ali; Yang, Yuan-Chi

doi:10.1093/jamia/ocaa116

Cited by 123 publications

(112 citation statements)

References 14 publications

Supporting

Mentioning

106

Contrasting

Order By: Relevance

“…Such approaches include sentiment analysis, educational purposes, and efforts to measure and raise public awareness. Recent approaches to analyzing aspects of the COVID-19 pandemic using social media data include monitoring the Twitter usage of G7 leaders 58 , monitoring self-reported symptoms on Twitter 59 , and analyzing the public perception of the disease through Facebook 60 . Moreover, infodemiology sources have provided valuable input in recruiting online survey participants through Facebook to measure individuals’ COVID-19 confidence levels 61 and in assessing the behavioral variations in COVID-19-related online search traffic in more than one search engine 62 .…”

Section: Discussionmentioning

confidence: 99%

COVID-19 predictability in the United States using Google Trends time series

Mavragani

Γκίλλας

2020

Sci Rep

115

View full text Add to dashboard Cite

During the unprecedented situation that all countries around the globe are facing due to the Coronavirus disease 2019 (COVID-19) pandemic, which has also had severe socioeconomic consequences, it is imperative to explore novel approaches to monitoring and forecasting regional outbreaks as they happen or even before they do so. To that end, in this paper, the role of Google query data in the predictability of COVID-19 in the United States at both national and state level is presented. As a preliminary investigation, Pearson and Kendall rank correlations are examined to explore the relationship between Google Trends data and COVID-19 data on cases and deaths. Next, a COVID-19 predictability analysis is performed, with the employed model being a quantile regression that is bias corrected via bootstrap simulation, i.e., a robust regression analysis that is the appropriate statistical approach to taking against the presence of outliers in the sample while also mitigating small sample estimation bias. The results indicate that there are statistically significant correlations between Google Trends and COVID-19 data, while the estimated models exhibit strong COVID-19 predictability. In line with previous work that has suggested that online real-time data are valuable in the monitoring and forecasting of epidemics and outbreaks, it is evident that such infodemiology approaches can assist public health policy makers in addressing the most crucial issues: flattening the curve, allocating health resources, and increasing the effectiveness and preparedness of their respective health care systems.

show abstract

Section: Discussionmentioning

confidence: 99%

COVID-19 predictability in the United States using Google Trends time series

Mavragani

Γκίλλας

2020

Sci Rep

115

View full text Add to dashboard Cite

show abstract

“…While Twitter data has been used to identify self-reports of symptoms by people who have tested positive for COVID-19 [ 3 , 4 ], the shortage of available testing and the delay of test results in the United States motivated us to assess whether Twitter data could be scaled to identify potential cases of COVID-19 that are not based on testing and, thus, may not have been reported to the CDC. There are studies that have not limited their exploration of COVID-19 symptoms on Twitter to users who have tested positive for COVID-19 [ 5 - 8 ]; however, limiting the detection of potential cases to symptoms may still underutilize the information available on Twitter.…”

Section: Discussionmentioning

confidence: 99%

“…An approach that has emerged for detecting cases without the need for extensive testing relies on voluntary self-reports of symptoms from the general population [ 1 ]. Considering that nearly one of every four adults in the United States already uses Twitter, and nearly half of them use it on a daily basis [ 2 ], researchers have begun exploring tweets for mentions of COVID-19 symptoms [ 3 - 8 ]. However, considering the incubation period of COVID-19 [ 9 ], detecting cases based on symptoms may not maximize the potential of Twitter data for real-time monitoring.…”

Section: Introductionmentioning

confidence: 99%

Toward Using Twitter for Tracking COVID-19: A Natural Language Processing Pipeline and Exploratory Data Set

Klein¹,

Magge²,

O’Connor³

et al. 2021

J Med Internet Res

View full text Add to dashboard Cite

Background In the United States, the rapidly evolving COVID-19 outbreak, the shortage of available testing, and the delay of test results present challenges for actively monitoring its spread based on testing alone. Objective The objective of this study was to develop, evaluate, and deploy an automatic natural language processing pipeline to collect user-generated Twitter data as a complementary resource for identifying potential cases of COVID-19 in the United States that are not based on testing and, thus, may not have been reported to the Centers for Disease Control and Prevention. Methods Beginning January 23, 2020, we collected English tweets from the Twitter Streaming application programming interface that mention keywords related to COVID-19. We applied handwritten regular expressions to identify tweets indicating that the user potentially has been exposed to COVID-19. We automatically filtered out “reported speech” (eg, quotations, news headlines) from the tweets that matched the regular expressions, and two annotators annotated a random sample of 8976 tweets that are geo-tagged or have profile location metadata, distinguishing tweets that self-report potential cases of COVID-19 from those that do not. We used the annotated tweets to train and evaluate deep neural network classifiers based on bidirectional encoder representations from transformers (BERT). Finally, we deployed the automatic pipeline on more than 85 million unlabeled tweets that were continuously collected between March 1 and August 21, 2020. Results Interannotator agreement, based on dual annotations for 3644 (41%) of the 8976 tweets, was 0.77 (Cohen κ). A deep neural network classifier, based on a BERT model that was pretrained on tweets related to COVID-19, achieved an F1-score of 0.76 (precision=0.76, recall=0.76) for detecting tweets that self-report potential cases of COVID-19. Upon deploying our automatic pipeline, we identified 13,714 tweets that self-report potential cases of COVID-19 and have US state–level geolocations. Conclusions We have made the 13,714 tweets identified in this study, along with each tweet’s time stamp and US state–level geolocation, publicly available to download. This data set presents the opportunity for future work to assess the utility of Twitter data as a complementary resource for tracking the spread of COVID-19.

show abstract

“…Themes of previous studies that focus on exploration of, description of, correlation of, or predictive modeling with Twitter data during COVID-19 pandemic include sentiment analysis [17,[25][26][27][28], public attitude/interest measurement [21,[29][30][31], content analysis [15,[32][33][34][35][36], topic modeling [16,26,27,[37][38][39][40], analysis of misinformation, disinformation, or conspiracies [20,[41][42][43][44][45][46], outbreak detection or disease nowcasting/forecasting [18,19], and more [47][48][49][50][51][52]. Similarly, data from other social media channels (e.g., Weibo, Reddit, Facebook) or search engine statistics are utilized for parallel analyses related to COVID-19 pandemic as well [53][54][55][56][57][58][59][60][61]…”

Section: Going Beyond Correlationsmentioning

confidence: 99%

Causal Modeling of Twitter Activity during COVID-19

Gencoglu

Gruber

2020

Computation

View full text Add to dashboard Cite

Understanding the characteristics of public attention and sentiment is an essential prerequisite for appropriate crisis management during adverse health events. This is even more crucial during a pandemic such as COVID-19, as primary responsibility of risk management is not centralized to a single institution, but distributed across society. While numerous studies utilize Twitter data in descriptive or predictive context during COVID-19 pandemic, causal modeling of public attention has not been investigated. In this study, we propose a causal inference approach to discover and quantify causal relationships between pandemic characteristics (e.g., number of infections and deaths) and Twitter activity as well as public sentiment. Our results show that the proposed method can successfully capture the epidemiological domain knowledge and identify variables that affect public attention and sentiment. We believe our work contributes to the field of infodemiology by distinguishing events that correlate with public attention from events that cause public attention.

show abstract

Self-reported COVID-19 symptoms on Twitter: an analysis and a research resource

Abstract: Abstract Objective To mine Twitter and quantitatively analyze COVID-19 symptoms self-reported by users, compare symptom distributions across studies, and create a symptom lexicon for future research. Materials and Methods We retrieved tweets using COVID-19-related… Show more

Cited by 123 publications

References 14 publications

COVID-19 predictability in the United States using Google Trends time series

COVID-19 predictability in the United States using Google Trends time series

Toward Using Twitter for Tracking COVID-19: A Natural Language Processing Pipeline and Exploratory Data Set

Causal Modeling of Twitter Activity during COVID-19

Contact Info

Product

Resources

About