Objective: We present a narrative review of recent work on the utilisation of Natural Language Processing (NLP) for the analysis of social media (including online health communities) specifically for public health applications. Methods: We conducted a literature review of NLP research that utilised social media or online consumer-generated text for public health applications, focussing on the years 2016 to 2018. Papers were identified in several ways, including PubMed searches and the inspection of recent conference proceedings from the Association of Computational Linguistics (ACL), the Conference on Human Factors in Computing Systems (CHI), and the International AAAI (Association for the Advancement of Artificial Intelligence) Conference on Web and Social Media (ICWSM). Popular data sources included Twitter, Reddit, various online health communities, and Facebook. Results: In the recent past, communicable diseases (e.g., influenza, dengue) have been the focus of much social media-based NLP health research. However, mental health and substance use and abuse (including the use of tobacco, alcohol, marijuana, and opioids) have been the subject of an increasing volume of research in the 2016 - 2018 period. Associated with this trend, the use of lexicon-based methods remains popular given the availability of psychologically validated lexical resources suitable for mental health and substance abuse research. Finally, we found that in the period under review “modern" machine learning methods (i.e. deep neural-network-based methods), while increasing in popularity, remain less widely used than “classical" machine learning methods.
Background Increases in electronic nicotine delivery system (ENDS) use among high school students from 2017 to 2019 appear to be associated with the increasing popularity of the ENDS device JUUL. Objective We employed a content analysis approach in conjunction with natural language processing methods using Twitter data to understand salient themes regarding JUUL use on Twitter, sentiment towards JUUL, and underage JUUL use. Methods Between July 2018 and August 2019, 11,556 unique tweets containing a JUUL-related keyword were collected. We manually annotated 4000 tweets for JUUL-related themes of use and sentiment. We used 3 machine learning algorithms to classify positive and negative JUUL sentiments as well as underage JUUL mentions. Results Of the annotated tweets, 78.80% (3152/4000) contained a specific mention of JUUL. Only 1.43% (45/3152) of tweets mentioned using JUUL as a method of smoking cessation, and only 6.85% (216/3152) of tweets mentioned the potential health effects of JUUL use. Of the machine learning methods used, the random forest classifier was the best performing algorithm among all 3 classification tasks (ie, positive sentiment, negative sentiment, and underage JUUL mentions). Conclusions Our findings suggest that a vast majority of Twitter users are not using JUUL to aid in smoking cessation nor do they mention the potential health benefits or detriments of JUUL use. Using machine learning algorithms to identify tweets containing underage JUUL mentions can support the timely surveillance of JUUL habits and opinions, further assisting youth-targeted public health intervention strategies.
Background: Perceptions of tobacco, cannabis, and electronic nicotine delivery systems (ENDS) are continually evolving in the United States. Exploring these characteristics through user generated text sources may provide novel insights into product use behavior that are challenging to identify using survey-based methods. The objective of this study was to compare the topics frequently discussed among Reddit members in cannabis, tobacco, and ENDS-specific subreddits.Methods: We collected 643,070 posts on the social media site Reddit between January 2013 and December 2018. We developed and validated an annotation scheme, achieving a high level of agreement among annotators. We then manually coded a subset of 2,630 posts for their content with relation to experiences and use of the three products of interest, and further developed word cloud representations of the words contained in these posts. Finally, we applied Latent Dirichlet Allocation (LDA) topic modeling to the 643,070 posts to identify emerging themes related to cannabis, tobacco, and ENDS products being discussed on Reddit.Results: Our manual annotation process yielded 2,148 (81.6%) posts that contained a mention(s) of either cannabis, tobacco, or ENDS with 1,537 (71.5%) of these posts mentioning cannabis, 421 (19.5%) mentioning ENDS, and 264 (12.2%) mentioning tobacco. In cannabis-specific subreddits, personal experiences with cannabis, cannabis legislation, health effects of cannabis use, methods and forms of cannabis, and the cultivation of cannabis were commonly discussed topics. The discussion in tobacco-specific subreddits often focused on the discussion of brands and types of combustible tobacco, as well as smoking cessation experiences and advice. In ENDS-specific subreddits, topics often included ENDS accessories and parts, flavors and nicotine solutions, procurement of ENDS, and the use of ENDS for smoking cessation.Conclusion: Our findings highlight the posting and participation patterns of Reddit members in cannabis, tobacco, and ENDS-specific subreddits and provide novel insights into aspects of personal use regarding these products. These findings complement epidemiologic study designs and highlight the potential of using specific subreddits to explore personal experiences with cannabis, ENDS, and tobacco products.
Background Since COVID-19 was declared a pandemic by the World Health Organization on March 11, 2020, the disease has had an unprecedented impact worldwide. Social media such as Reddit can serve as a resource for enhancing situational awareness, particularly regarding monitoring public attitudes and behavior during the crisis. Insights gained can then be utilized to better understand public attitudes and behaviors during the COVID-19 crisis, and to support communication and health-promotion messaging. Objective The aim of this study was to compare public attitudes toward the 2020-2021 COVID-19 pandemic across four predominantly English-speaking countries (the United States, the United Kingdom, Canada, and Australia) using data derived from the social media platform Reddit. Methods We utilized a topic modeling natural language processing method (more specifically latent Dirichlet allocation). Topic modeling is a popular unsupervised learning technique that can be used to automatically infer topics (ie, semantically related categories) from a large corpus of text. We derived our data from six country-specific, COVID-19–related subreddits (r/CoronavirusAustralia, r/CoronavirusDownunder, r/CoronavirusCanada, r/CanadaCoronavirus, r/CoronavirusUK, and r/coronavirusus). We used topic modeling methods to investigate and compare topics of concern for each country. Results Our consolidated Reddit data set consisted of 84,229 initiating posts and 1,094,853 associated comments collected between February and November 2020 for the United States, the United Kingdom, Canada, and Australia. The volume of posting in COVID-19–related subreddits declined consistently across all four countries during the study period (February 2020 to November 2020). During lockdown events, the volume of posts peaked. The UK and Australian subreddits contained much more evidence-based policy discussion than the US or Canadian subreddits. Conclusions This study provides evidence to support the contention that there are key differences between salient topics discussed across the four countries on the Reddit platform. Further, our approach indicates that Reddit data have the potential to provide insights not readily apparent in survey-based approaches.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.