It has been over a year since the first known case of coronavirus disease emerged, yet the pandemic is far from over. To date, the coronavirus pandemic has infected over eighty million people and has killed more than 1.78 million worldwide. This study aims to explore "how useful is Reddit social media platform to surveil COVID-19 pandemic?" and "how do people' s concerns/behaviors change over the course of COVID-19 pandemic in North Carolina?". The purpose of this study was to compare people's thoughts, behavior changes, discussion topics, and the number of confirmed cases and deaths by applying natural language processing (NLP) to COVID-19 related data.Methods: In this study, we collected COVID-19 related data from 18 subreddits of North Carolina from March to August 2020. Next, we applied methods from natural language processing and machine learning to analyze collected Reddit posts using feature engineering, topic modeling, custom named-entity recognition (NER), and BERT-based (Bidirectional Encoder Representations from Transformers) sentence clustering. Using these methods, we were able to glean people's responses and their concerns about COVID-19 pandemic in North Carolina.
Results:We observed a positive change in attitudes towards masks for residents in North Carolina. The highfrequency words in all subreddit corpora for each of the COVID-19 mitigation strategy categories are: Distancing (DIST)-"social distance/distancing", "lockdown", and "work from home"; Disinfection (DIT)-"(hand) sanitizer/soap", "hygiene", and "wipe"; Personal Protective Equipment (PPE)-"mask/facemask(s)/face shield", "n95(s)/kn95", and "cloth/ gown"; Symptoms (SYM)-"death", "flu/influenza", and "cough/coughed"; Testing (TEST)-"cases", "(antibody) test", and "test results (positive/negative)".
Conclusion:The findings in our study show that the use of Reddit data to monitor COVID-19 pandemic in North Carolina (NC) was effective. The study shows the utility of NLP methods (e.g. cosine similarity, Latent Dirichlet Allocation (LDA) topic modeling, custom NER and BERT-based sentence clustering) in discovering the change of the public's concerns/behaviors over the course of COVID-19 pandemic in NC using Reddit data. Moreover, the results show that social media data can be utilized to surveil the epidemic situation in a specific community.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.