Active learning-based Systematic reviewing using switching classification models: the case of the onset, maintenance, and relapse of depressive disorders

Teijema, Jelle Jasper; Hofstee, Laura; Brouwer, Marlies; Bruin, Jonathan de; Ferdinands, Gerbrich; Boer, Jan de; Siso, Pablo Vizan; Brand, Sofie Arsenia Gabriëlla Eleonora van den; Bockting, Claudi; Schoot, Rens van de; Bagheri, Ayoub

doi:10.31234/osf.io/t7bpd

Cited by 12 publications

(22 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Firstly, the results of simulation studies were applied to new data. For example, which model to use in the first screening phase was decided based on a simulation study performed on a different, smaller, dataset about depression (Teijema et al, 2022).…”

Section: Conclusion and Discussionmentioning

confidence: 99%

“…Therefore, for the second screening phase, we first used the labeling decisions of the first round to optimize the hyperparameters per topic to receive the optimal hyperparameters for a 17-layer convolutional neural network (CNN) (Teijema, 2021) in combination with a doc2vec feature extractor. This model appeared to have better performance than the default deep learning models available in ASReview as was concluded in a simulation study conducted on similar data (Teijema et al, 2022). Using the optimized hyperparameters, we trained the 17-layer CNN model.…”

Section: Screening Phase 2: Deep Learningmentioning

confidence: 96%

“…In the first screening phase, we used logistic regression as the classifier and TF-IDF as the feature extractor. The settings of the active learning model were based on a simulation study performed by Teijema et al (2022) on the depression dataset of Brouwer et al (2019).…”

Section: Screening Phase 1: Active Learningmentioning

confidence: 99%

“…Simulation studies show that machine-learning-based prioritization with active learning enables us to find relevant studies much faster than traditional screening methods; it can save up to 95% of screening time ( Van de Schoot, De Bruin, Schram, Zahedi, De Boer, Weijdema, Kramer, Huijts, Ferdinands, Harkema, Harkema, Willemsen, Ma, Fang, Sybren, et al, 2021). For example, a simulation study was performed on labeled data retrieved from a previously described meta-analysis (Brouwer et al, 2019) on the prospective evidence for leading psychological theories of depressive relapse (Teijema et al, 2022). The total number of papers found in the search was 50.936, of which 63 were included in the final meta-analysis (excluding the papers found with snowballing).…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

AI-aided Systematic Review to Create a Database with Potentially Relevant Papers on Depression, Anxiety, and Addiction

Brouwer¹,

Hofstee²,

Brand³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

It is of utmost importance to provide an overview and strength of evidence of predictive factors and to investigate the current state of affairs on evidence for all published and hypothesized factors that contribute to the onset, relapse, and maintenance of anxiety-, substance use-, and depressive disorders. Thousands of such articles have been published on potential factors of CMDs, yet a clear overview of all preceding factors and interaction between factors is missing. Therefore, the main aim of the current project was to create a database with potentially relevant papers obtained via a systematic. The current paper describes every step of the process of constructing the database, from search query to database. After a broad search and cleaning of the data, we used active learning using a shallow classifier and labeled the first set of papers. Then, we applied a second screening phase in which we switched to a different active learning model (i.e., a neural net) to identify difficult-to-find papers due to concept ambiguity. In the third round of screening, we checked for incorrectly included/excluded papers in a quality assessment procedure resulting in the final database. All scripts, data files, and output files of the software are available via Zenodo (for Github code), the Open Science Framework (for protocols, output), and DANS (for the datasets) and are referred to in the specific sections, thereby making the project fully reproducible.

show abstract

Section: Conclusion and Discussionmentioning

confidence: 99%

Section: Screening Phase 2: Deep Learningmentioning

confidence: 96%

Section: Screening Phase 1: Active Learningmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

AI-aided Systematic Review to Create a Database with Potentially Relevant Papers on Depression, Anxiety, and Addiction

Brouwer¹,

Hofstee²,

Brand³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Lastly, we could take an even broader view and argue that the set of included records is also allowed to vary as long as the conclusion coming out of the post-processing stage is still the same. Reproducing a systematic review in this way gives confidence that missing a specific record is not essential for the final conclusions; see for an example [33].…”

Section: Reproducibility In the Three Phasesmentioning

confidence: 99%

Reproducibility and Data storage Checklist for Active Learning-Aided Systematic Reviews

Lombaers¹,

Bruin²,

Schoot³

2023

Preprint

View full text Add to dashboard Cite

In the screening phase of a systematic review, screening prioritization via active learning effectively reduces the workload. However, the PRISMA guidelines are not sufficient for reporting the screening phase in a reproducible manner. Text screening with active learning is an iterative process, but the labeling decisions and the training of the active learning model can happen independently of each other in time. So it is not trivial to store the data from both events so that you can still know which iteration of the model was used for each labeling decision. Moreover, many iterations of the active learning model will be trained throughout the screening process, producing an enormous amount of data (think of many gigabytes or even terabytes of data), and machine learning models are continually becoming larger. Together this can add up to an undesirable amount of data when naively storing all the data produced at every iteration of the active learning pipeline. This article clarifies the steps in an active learning-aided screening process and what data is produced at every step. We show how this data can be stored efficiently in terms of size. Most notably, the data produced by the model is where we need to strike a balance between reproducibility and storage size. Finally, we created the RDAL-Checklist (Reproducibility and Data storage for Active Learning-aided systematic reviews – checklist) that helps users and creators of active learning software make their screening process reproducible.

show abstract

Performance of active learning models for screening prioritization in systematic reviews: a simulation study into the Average Time to Discover relevant records

et al. 2023

Self Cite

View full text Add to dashboard Cite

Background Conducting a systematic review demands a significant amount of effort in screening titles and abstracts. To accelerate this process, various tools that utilize active learning have been proposed. These tools allow the reviewer to interact with machine learning software to identify relevant publications as early as possible. The goal of this study is to gain a comprehensive understanding of active learning models for reducing the workload in systematic reviews through a simulation study. Methods The simulation study mimics the process of a human reviewer screening records while interacting with an active learning model. Different active learning models were compared based on four classification techniques (naive Bayes, logistic regression, support vector machines, and random forest) and two feature extraction strategies (TF-IDF and doc2vec). The performance of the models was compared for six systematic review datasets from different research areas. The evaluation of the models was based on the Work Saved over Sampling (WSS) and recall. Additionally, this study introduces two new statistics, Time to Discovery (TD) and Average Time to Discovery (ATD). Results The models reduce the number of publications needed to screen by 91.7 to 63.9% while still finding 95% of all relevant records (WSS@95). Recall of the models was defined as the proportion of relevant records found after screening 10% of of all records and ranges from 53.6 to 99.8%. The ATD values range from 1.4% till 11.7%, which indicate the average proportion of labeling decisions the researcher needs to make to detect a relevant record. The ATD values display a similar ranking across the simulations as the recall and WSS values. Conclusions Active learning models for screening prioritization demonstrate significant potential for reducing the workload in systematic reviews. The Naive Bayes + TF-IDF model yielded the best results overall. The Average Time to Discovery (ATD) measures performance of active learning models throughout the entire screening process without the need for an arbitrary cut-off point. This makes the ATD a promising metric for comparing the performance of different models across different datasets.

show abstract

Active learning-based Systematic reviewing using switching classification models: the case of the onset, maintenance, and relapse of depressive disorders

Cited by 12 publications

References 13 publications

AI-aided Systematic Review to Create a Database with Potentially Relevant Papers on Depression, Anxiety, and Addiction

AI-aided Systematic Review to Create a Database with Potentially Relevant Papers on Depression, Anxiety, and Addiction

Reproducibility and Data storage Checklist for Active Learning-Aided Systematic Reviews

Performance of active learning models for screening prioritization in systematic reviews: a simulation study into the Average Time to Discover relevant records

Contact Info

Product

Resources

About