Unmasking the Twitter Discourses on Masks During the COVID-19 Pandemic: User Cluster–Based BERT Topic Modeling Approach

Xu, Weiai Wayne; Tshimula, Jean Marie; Dubé, Ève; Graham, Janice E.; Greyson, Devon; MacDonald, Noni E.; Meyer, Samantha B.

doi:10.2196/41198

Cited by 8 publications

(4 citation statements)

References 64 publications

(86 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Relying on patents as a proxy for innovation might not capture the full spectrum of grassroots and open-source solutions. The NLP method BERT also has its own limitations; it leaves out a nonnegligible portion of the corpus due to incongruent themes, which we reported in this study as outliers 37 . The clusters generated are not perfect, but they suffice for a quick, relatively easy, and comprehensible analysis of large amounts of data.…”

Section: Discussionmentioning

confidence: 81%

Exploring data analytics climate startups using BERTopic

Tea-makorn,

Maitreenukul,

Starita

et al. 2023

Preprint

View full text Add to dashboard Cite

Data analytics has recently gained significant momentum in tackling various challenges, climate change included. This work explores the sector, funding, and innovation landscape of climate startups that leverage data analytics. It applies a recently developed natural language processing algorithm, BERTopic, to organically discover sectors from company descriptions and innovation categories from patent abstracts. We classified data analytics climate startups into 37 sectors, with agriculture being the most prolific. We found that the state of California alone has more of these startups than any other country in the world; most funding rounds were raised in the Seed stage, and most companies exited by acquisition; most of the innovations generated are related to automation security, while the electric vehicle sector is most prolific in creating innovation. Investors, analysts, and entrepreneurs can use these methods and results to process large datasets and aid decision-making in investments and climate-related policies.

show abstract

Section: Discussionmentioning

confidence: 81%

Exploring data analytics climate startups using BERTopic

Tea-makorn,

Maitreenukul,

Starita

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…Future work could consider using BERTopic which uses the pretrained Bidirectional Encoder Representations from Transformers (BERT) model as a feature extractor and apply clustering algorithms to identify latent topics in a text repository. BERT-topic also allows for the incorporation of human inputs such as topic labels or domain knowledge [ 48 ]. Our study did not focus on any differences in nutrition-related tweets based on geographies.…”

Section: Discussionmentioning

confidence: 99%

Investigating the Role of Nutrition in Enhancing Immunity During the COVID-19 Pandemic: Twitter Text-Mining Analysis

Shankar¹,

Ranganathan²,

Venkata³

et al. 2023

J Med Internet Res

View full text Add to dashboard Cite

Background The COVID-19 pandemic has brought to the spotlight the critical role played by a balanced and healthy diet in bolstering the human immune system. There is burgeoning interest in nutrition-related information on social media platforms like Twitter. There is a critical need to assess and understand public opinion, attitudes, and sentiments toward nutrition-related information shared on Twitter. Objective This study uses text mining to analyze nutrition-related messages on Twitter to identify and analyze how the general public perceives various food groups and diets for improving immunity to the SARS-CoV-2 virus. Methods We gathered 71,178 nutrition-related tweets that were posted between January 01, 2020, and September 30, 2020. The Correlated Explanation text mining algorithm was used to identify frequently discussed topics that users mentioned as contributing to immunity building against SARS-CoV-2. We assessed the relative importance of these topics and performed a sentiment analysis. We also qualitatively examined the tweets to gain a closer understanding of nutrition-related topics and food groups. Results Text-mining yielded 10 topics that users discussed frequently on Twitter, viz proteins, whole grains, fruits, vegetables, dairy-related, spices and herbs, fluids, supplements, avoidable foods, and specialty diets. Supplements were the most frequently discussed topic (23,913/71,178, 33.6%) with a higher proportion (20,935/23,913, 87.75%) exhibiting a positive sentiment with a score of 0.41. Consuming fluids (17,685/71,178, 24.85%) and fruits (14,807/71,178, 20.80%) were the second and third most frequent topics with favorable, positive sentiments. Spices and herbs (8719/71,178, 12.25%) and avoidable foods (8619/71,178, 12.11%) were also frequently discussed. Negative sentiments were observed for a higher proportion of avoidable foods (7627/8619, 84.31%) with a sentiment score of –0.39. Conclusions This study identified 10 important food groups and associated sentiments that users discussed as a means to improve immunity. Our findings can help dieticians and nutritionists to frame appropriate interventions and diet programs.

show abstract

“…The tweets included in the data set were not analyzed for possible bot activity, and bots can also spread misinformation [ 51 ]. However, bot presence was likely low, as retweets were excluded for this study, and bots usually retweet content without tweeting the original content [ 52 ]. An argument put forth against deleting bot tweets in a data set is that it is “artificially manipulating a raw data set,” as bots are naturally found on Twitter [ 53 ].…”

Section: Discussionmentioning

confidence: 99%

Interdisciplinary Approach to Identify and Characterize COVID-19 Misinformation on Twitter: Mixed Methods Study

Isip Tan,

Cleofas,

Solano

et al. 2023

JMIR Form Res

View full text Add to dashboard Cite

Background Studying COVID-19 misinformation on Twitter presents methodological challenges. A computational approach can analyze large data sets, but it is limited when interpreting context. A qualitative approach allows for a deeper analysis of content, but it is labor-intensive and feasible only for smaller data sets. Objective We aimed to identify and characterize tweets containing COVID-19 misinformation. Methods Tweets geolocated to the Philippines (January 1 to March 21, 2020) containing the words coronavirus, covid, and ncov were mined using the GetOldTweets3 Python library. This primary corpus (N=12,631) was subjected to biterm topic modeling. Key informant interviews were conducted to elicit examples of COVID-19 misinformation and determine keywords. Using NVivo (QSR International) and a combination of word frequency and text search using key informant interview keywords, subcorpus A (n=5881) was constituted and manually coded to identify misinformation. Constant comparative, iterative, and consensual analyses were used to further characterize these tweets. Tweets containing key informant interview keywords were extracted from the primary corpus and processed to constitute subcorpus B (n=4634), of which 506 tweets were manually labeled as misinformation. This training set was subjected to natural language processing to identify tweets with misinformation in the primary corpus. These tweets were further manually coded to confirm labeling. Results Biterm topic modeling of the primary corpus revealed the following topics: uncertainty, lawmaker’s response, safety measures, testing, loved ones, health standards, panic buying, tragedies other than COVID-19, economy, COVID-19 statistics, precautions, health measures, international issues, adherence to guidelines, and frontliners. These were categorized into 4 major topics: nature of COVID-19, contexts and consequences, people and agents of COVID-19, and COVID-19 prevention and management. Manual coding of subcorpus A identified 398 tweets with misinformation in the following formats: misleading content (n=179), satire and/or parody (n=77), false connection (n=53), conspiracy (n=47), and false context (n=42). The discursive strategies identified were humor (n=109), fear mongering (n=67), anger and disgust (n=59), political commentary (n=59), performing credibility (n=45), overpositivity (n=32), and marketing (n=27). Natural language processing identified 165 tweets with misinformation. However, a manual review showed that 69.7% (115/165) of tweets did not contain misinformation. Conclusions An interdisciplinary approach was used to identify tweets with COVID-19 misinformation. Natural language processing mislabeled tweets, likely due to tweets written in Filipino or a combination of the Filipino and English languages. Identifying the formats and discursive strategies of tweets with misinformation required iterative, manual, and emergent coding by human coders with experiential and cultural knowledge of Twitter. An interdisciplinary team composed of experts in health, health informatics, social science, and computer science combined computational and qualitative methods to gain a better understanding of COVID-19 misinformation on Twitter.

show abstract

Unmasking the Twitter Discourses on Masks During the COVID-19 Pandemic: User Cluster–Based BERT Topic Modeling Approach

Cited by 8 publications

References 64 publications

Exploring data analytics climate startups using BERTopic

Exploring data analytics climate startups using BERTopic

Investigating the Role of Nutrition in Enhancing Immunity During the COVID-19 Pandemic: Twitter Text-Mining Analysis

Interdisciplinary Approach to Identify and Characterize COVID-19 Misinformation on Twitter: Mixed Methods Study

Contact Info

Product

Resources

About