Background The rapid growth of the biomedical literature makes identifying strong evidence a time-consuming task. Applying machine learning to the process could be a viable solution that limits effort while maintaining accuracy. Objective The goal of the research was to summarize the nature and comparative performance of machine learning approaches that have been applied to retrieve high-quality evidence for clinical consideration from the biomedical literature. Methods We conducted a systematic review of studies that applied machine learning techniques to identify high-quality clinical articles in the biomedical literature. Multiple databases were searched to July 2020. Extracted data focused on the applied machine learning model, steps in the development of the models, and model performance. Results From 3918 retrieved studies, 10 met our inclusion criteria. All followed a supervised machine learning approach and applied, from a limited range of options, a high-quality standard for the training of their model. The results show that machine learning can achieve a sensitivity of 95% while maintaining a high precision of 86%. Conclusions Machine learning approaches perform well in retrieving high-quality clinical studies. Performance may improve by applying more sophisticated approaches such as active learning and unsupervised machine learning approaches.
Background Coronavirus disease 2019 (COVID-19) is a novel infectious disease caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Despite the paucity of evidence, various complementary, alternative and integrative medicines (CAIMs) have been being touted as both preventative and curative. We conducted sentiment and emotion analysis with the intent of understanding CAIM content related to COVID-19 being generated on Twitter across 9 months. Methods Tweets relating to CAIM and COVID-19 were extracted from the George Washington University Libraries Dataverse Coronavirus tweets dataset from March 03 to November 30, 2020. We trained and tested a machine learning classifier using a large, pre-labelled Twitter dataset, which was applied to predict the sentiment of each CAIM-related tweet, and we used a natural language processing package to identify the emotions based on the words contained in the tweets. Results Our dataset included 28 713 English-language Tweets. The number of CAIM-related tweets during the study period peaked in May 2020, then dropped off sharply over the subsequent three months; the fewest CAIM-related tweets were collected during August 2020 and remained low for the remainder of the collection period. Most tweets (n = 15 612, 54%) were classified as positive, 31% were neutral (n = 8803) and 15% were classified as negative (n = 4298). The most frequent emotions expressed across tweets were trust, followed by fear, while surprise and disgust were the least frequent. Though volume of tweets decreased over the 9 months of the study, the expressed sentiments and emotions remained constant. Conclusion The results of this sentiment analysis enabled us to establish key CAIMs being discussed at the intersection of COVID-19 across a 9-month period on Twitter. Overall, the majority of our subset of tweets were positive, as were the emotions associated with the words found within them. This may be interpreted as public support for CAIM, however, further qualitative investigation is warranted. Such future directions may be used to combat misinformation and improve public health strategies surrounding the use of social media information.
Background A barrier to practicing evidence-based medicine is the rapidly increasing body of biomedical literature. Use of method terms to limit the search can help reduce the burden of screening articles for clinical relevance; however, such terms are limited by their partial dependence on indexing terms and usually produce low precision, especially when high sensitivity is required. Machine learning has been applied to the identification of high-quality literature with the potential to achieve high precision without sacrificing sensitivity. The use of artificial intelligence has shown promise to improve the efficiency of identifying sound evidence. Objective The primary objective of this research is to derive and validate deep learning machine models using iterations of Bidirectional Encoder Representations from Transformers (BERT) to retrieve high-quality, high-relevance evidence for clinical consideration from the biomedical literature. Methods Using the HuggingFace Transformers library, we will experiment with variations of BERT models, including BERT, BioBERT, BlueBERT, and PubMedBERT, to determine which have the best performance in article identification based on quality criteria. Our experiments will utilize a large data set of over 150,000 PubMed citations from 2012 to 2020 that have been manually labeled based on their methodological rigor for clinical use. We will evaluate and report on the performance of the classifiers in categorizing articles based on their likelihood of meeting quality criteria. We will report fine-tuning hyperparameters for each model, as well as their performance metrics, including recall (sensitivity), specificity, precision, accuracy, F-score, the number of articles that need to be read before finding one that is positive (meets criteria), and classification probability scores. Results Initial model development is underway, with further development planned for early 2022. Performance testing is expected to star in February 2022. Results will be published in 2022. Conclusions The experiments will aim to improve the precision of retrieving high-quality articles by applying a machine learning classifier to PubMed searching. International Registered Report Identifier (IRRID) DERR1-10.2196/29398
BACKGROUND The COVID-19 pandemic and associated public health mitigation strategies dramatically changed patterns of daily life activities worldwide. Public health restrictions during the pandemic had unintentional consequences on chronic disease risk factors. Cancer is a leading chronic disease worldwide with several known modifiable risk factors, including smoking, alcohol, poor nutrition, and physical inactivity. OBJECTIVE The study objectives were to conduct a sentiment and emotion analysis using Twitter data to evaluate changes in attitudes towards four cancer risk factors (physical inactivity, poor nutrition, alcohol, and smoking) over time during the first year of the COVID-19 pandemic. METHODS Tweets during 2020 relating to COVID-19 and the four cancer risk factors were extracted from the George Washington University Libraries’ Dataverse. From there, Tweets were defined and filtered using key words to create four unique datasets. We trained and tested a machine learning classifier using a pre-labelled Twitter dataset. This was applied to find the sentiment (positive, negative, or neutral) of each tweet. A natural language processing package was used to identify the emotions (anger, anticipation, disgust, fear, joy, sadness, surprise, and trust) based on the words contained in the tweets. Sentiments and emotions related to each of the risk factors were evaluated over time and word clouds were presented to evaluate common key words that emerged. RESULTS The sentiment analysis revealed that 57% of tweets about physical activity were positive, 16% negative and 27% neutral (n=90,813 tweets). Similar patterns were observed for nutrition, where 55%, 16%, and 29% of tweets were classified as positive, negative, or neutral, respectively (n=50,396 tweets). For alcohol the proportion of positive, negative, and neutral tweets were 47%, 23%, and 30% (from a total n=74,484 tweets) and for smoking the distribution was 41%, 24% and 35%, respectively (n=28,220 tweets). The sentiments were relatively stable over time. Results from the emotion analysis suggest that the most common emotion expressed across physical activity and nutrition tweets was trust, whereas for alcohol the most common emotion was joy and for smoking it was fear. The emotions expressed remained relatively constant over the observed time period. Analysis of the word clouds revealed some further insights into common themes expressed in relation to some of the risk factors and revealed possible sources of bias. CONCLUSIONS The results of this analysis provided insight into the attitudes towards cancer risk factors as expressed on Twitter during the first year of the COVID-19 pandemic. Overall, for all four risk factors, most tweets had a positive sentiment and varied emotions across the different datasets. While these results can play a role in promoting public health, more work is needed to understand how this can be translated into meaningful data to inform public health interventions in a timely manner. CLINICALTRIAL NA
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.