Background Digital misinformation, primarily on social media, has led to harmful and costly beliefs in the general population. Notably, these beliefs have resulted in public health crises to the detriment of governments worldwide and their citizens. However, public health officials need access to a comprehensive system capable of mining and analyzing large volumes of social media data in real time. Objective This study aimed to design and develop a big data pipeline and ecosystem (UbiLab Misinformation Analysis System [U-MAS]) to identify and analyze false or misleading information disseminated via social media on a certain topic or set of related topics. Methods U-MAS is a platform-independent ecosystem developed in Python that leverages the Twitter V2 application programming interface and the Elastic Stack. The U-MAS expert system has 5 major components: data extraction framework, latent Dirichlet allocation (LDA) topic model, sentiment analyzer, misinformation classification model, and Elastic Cloud deployment (indexing of data and visualizations). The data extraction framework queries the data through the Twitter V2 application programming interface, with queries identified by public health experts. The LDA topic model, sentiment analyzer, and misinformation classification model are independently trained using a small, expert-validated subset of the extracted data. These models are then incorporated into U-MAS to analyze and classify the remaining data. Finally, the analyzed data are loaded into an index in the Elastic Cloud deployment and can then be presented on dashboards with advanced visualizations and analytics pertinent to infodemiology and infoveillance analysis. Results U-MAS performed efficiently and accurately. Independent investigators have successfully used the system to extract significant insights into a fluoride-related health misinformation use case (2016 to 2021). The system is currently used for a vaccine hesitancy use case (2007 to 2022) and a heat wave–related illnesses use case (2011 to 2022). Each component in the system for the fluoride misinformation use case performed as expected. The data extraction framework handles large amounts of data within short periods. The LDA topic models achieved relatively high coherence values (0.54), and the predicted topics were accurate and befitting to the data. The sentiment analyzer performed at a correlation coefficient of 0.72 but could be improved in further iterations. The misinformation classifier attained a satisfactory correlation coefficient of 0.82 against expert-validated data. Moreover, the output dashboard and analytics hosted on the Elastic Cloud deployment are intuitive for researchers without a technical background and comprehensive in their visualization and analytics capabilities. In fact, the investigators of the fluoride misinformation use case have successfully used the system to extract interesting and important insights into public health, which have been published separately. Conclusions The novel U-MAS pipeline has the potential to detect and analyze misleading information related to a particular topic or set of related topics.
Background Although social media has the potential to spread misinformation, it can also be a valuable tool for elucidating the social factors that contribute to the onset of negative beliefs. As a result, data mining has become a widely used technique in infodemiology and infoveillance research to combat misinformation effects. On the other hand, there is a lack of studies that specifically aim to investigate misinformation about fluoride on Twitter. Web-based individual concerns on the side effects of fluoridated oral care products and tap water stimulate the emergence and propagation of convictions that boost antifluoridation activism. In this sense, a previous content analysis–driven study demonstrated that the term fluoride-free was frequently associated with antifluoridation interests. Objective This study aimed to analyze “fluoride-free” tweets regarding their topics and frequency of publication over time. Methods A total of 21,169 tweets published in English between May 2016 and May 2022 that included the keyword “fluoride-free” were retrieved by the Twitter application programming interface. Latent Dirichlet allocation (LDA) topic modeling was applied to identify the salient terms and topics. The similarity between topics was calculated through an intertopic distance map. Moreover, an investigator manually assessed a sample of tweets depicting each of the most representative word groups that determined specific issues. Lastly, additional data visualization was performed regarding the total count of each topic of fluoride-free record and its relevance over time, using Elastic Stack software. Results We identified 3 issues by applying the LDA topic modeling: “healthy lifestyle” (topic 1), “consumption of natural/organic oral care products” (topic 2), and “recommendations for using fluoride-free products/measures” (topic 3). Topic 1 was related to users’ concerns about leading a healthier lifestyle and the potential impacts of fluoride consumption, including its hypothetical toxicity. Complementarily, topic 2 was associated with users’ personal interests and perceptions of consuming natural and organic fluoride-free oral care products, whereas topic 3 was linked to users’ recommendations for using fluoride-free products (eg, switching from fluoridated toothpaste to fluoride-free alternatives) and measures (eg, consuming unfluoridated bottled water instead of fluoridated tap water), comprising the propaganda of dental products. Additionally, the count of tweets on fluoride-free content decreased between 2016 and 2019 but increased again from 2020 onward. Conclusions Public concerns toward a healthy lifestyle, including the adoption of natural and organic cosmetics, seem to be the main motivation of the recent increase of “fluoride-free” tweets, which can be boosted by the propagation of fluoride falsehoods on the web. Therefore, public health authorities, health professionals, and legislators should be aware of the spread of fluoride-free content on social media to create and implement strategies against their potential health damage for the population.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.