Systematic reviews and meta-analyses are top of the bill in research. However, the screening phase requires an enormous effort in reading and labeling thousands of papers identified via systematic search. Active learning-aided systematic reviewing offers a solution by combining machine learning algorithms with user input to reduce screening load. This study explores the performance of these algorithms and different ways to apply them. This study is divided into four studies evaluating and improving this active learning pipeline. First, the performance and stability of the active learning pipeline were assessed via simulations and re-analysis of the outcome. Secondly, a convolutional neural network was developed to improve upon available machine learning algorithms. Thirdly, the performance of different algorithm combinations was tested and compared. Finally, algorithm-switching models were built for increased performance. The study concludes with proposals for improving active learning-aided systematic reviews based on combinations of the four studies. It was found that switching models can outperform the currently used models.
It is of utmost importance to provide an overview and strength of evidence of predictive factors and to investigate the current state of affairs on evidence for all published and hypothesized factors that contribute to the onset, relapse, and maintenance of anxiety-, substance use-, and depressive disorders. Thousands of such articles have been published on potential factors of CMDs, yet a clear overview of all preceding factors and interaction between factors is missing. Therefore, the main aim of the current project was to create a database with potentially relevant papers obtained via a systematic. The current paper describes every step of the process of constructing the database, from search query to database. After a broad search and cleaning of the data, we used active learning using a shallow classifier and labeled the first set of papers. Then, we applied a second screening phase in which we switched to a different active learning model (i.e., a neural net) to identify difficult-to-find papers due to concept ambiguity. In the third round of screening, we checked for incorrectly included/excluded papers in a quality assessment procedure resulting in the final database. All scripts, data files, and output files of the software are available via Zenodo (for Github code), the Open Science Framework (for protocols, output), and DANS (for the datasets) and are referred to in the specific sections, thereby making the project fully reproducible.
This study investigated the utility of active learning in accelerating the systematic review process. We systematically reviewed literature from 2006 onwards, using the open-source software ASReview, and selected 48 relevant articles out of 1548, while incorporating 208 out of 305 collected datasets. Our analysis overwhelmingly recommends active learning for improving the efficiency of the screening phase in systematic reviews, despite some limitations. Future research should focus on standardizing metrics, promoting open data, and diversifying models to advance active learning in systematic reviews.
Background Conducting a systematic review demands a significant amount of effort in screening titles and abstracts. To accelerate this process, various tools that utilize active learning have been proposed. These tools allow the reviewer to interact with machine learning software to identify relevant publications as early as possible. The goal of this study is to gain a comprehensive understanding of active learning models for reducing the workload in systematic reviews through a simulation study. Methods The simulation study mimics the process of a human reviewer screening records while interacting with an active learning model. Different active learning models were compared based on four classification techniques (naive Bayes, logistic regression, support vector machines, and random forest) and two feature extraction strategies (TF-IDF and doc2vec). The performance of the models was compared for six systematic review datasets from different research areas. The evaluation of the models was based on the Work Saved over Sampling (WSS) and recall. Additionally, this study introduces two new statistics, Time to Discovery (TD) and Average Time to Discovery (ATD). Results The models reduce the number of publications needed to screen by 91.7 to 63.9% while still finding 95% of all relevant records (WSS@95). Recall of the models was defined as the proportion of relevant records found after screening 10% of of all records and ranges from 53.6 to 99.8%. The ATD values range from 1.4% till 11.7%, which indicate the average proportion of labeling decisions the researcher needs to make to detect a relevant record. The ATD values display a similar ranking across the simulations as the recall and WSS values. Conclusions Active learning models for screening prioritization demonstrate significant potential for reducing the workload in systematic reviews. The Naive Bayes + TF-IDF model yielded the best results overall. The Average Time to Discovery (ATD) measures performance of active learning models throughout the entire screening process without the need for an arbitrary cut-off point. This makes the ATD a promising metric for comparing the performance of different models across different datasets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.