Cochrane Centralised Search Service showed high sensitivity identifying randomized controlled trials: A retrospective analysis

Redmond

Lamé

et al. 2021

BMC Med Res Methodol

Background Crowdsourcing engages the help of large numbers of people in tasks, activities or projects, usually via the internet. One application of crowdsourcing is the screening of citations for inclusion in a systematic review. There is evidence that a ‘Crowd’ of non-specialists can reliably identify quantitative studies, such as randomized controlled trials, through the assessment of study titles and abstracts. In this feasibility study, we investigated crowd performance of an online, topic-based citation-screening task, assessing titles and abstracts for inclusion in a single mixed-studies systematic review. Methods This study was embedded within a mixed studies systematic review of maternity care, exploring the effects of training healthcare professionals in intrapartum cardiotocography. Citation-screening was undertaken via Cochrane Crowd, an online citizen science platform enabling volunteers to contribute to a range of tasks identifying evidence in health and healthcare. Contributors were recruited from users registered with Cochrane Crowd. Following completion of task-specific online training, the crowd and the review team independently screened 9546 titles and abstracts. The screening task was subsequently repeated with a new crowd following minor changes to the crowd agreement algorithm based on findings from the first screening task. We assessed the crowd decisions against the review team categorizations (the ‘gold standard’), measuring sensitivity, specificity, time and task engagement. Results Seventy-eight crowd contributors completed the first screening task. Sensitivity (the crowd’s ability to correctly identify studies included within the review) was 84% (N = 42/50), and specificity (the crowd’s ability to correctly identify excluded studies) was 99% (N = 9373/9493). Task completion was 33 h for the crowd and 410 h for the review team; mean time to classify each record was 6.06 s for each crowd participant and 3.96 s for review team members. Replicating this task with 85 new contributors and an altered agreement algorithm found 94% sensitivity (N = 48/50) and 98% specificity (N = 9348/9493). Contributors reported positive experiences of the task. Conclusion It might be feasible to recruit and train a crowd to accurately perform topic-based citation-screening for mixed studies systematic reviews, though resource expended on the necessary customised training required should be factored in. In the face of long review production times, crowd screening may enable a more time-efficient conduct of reviews, with minimal reduction of citation-screening accuracy, but further research is needed.

Section: Discussionmentioning

confidence: 99%

Crowdsourcing citation-screening in a mixed-studies systematic review: a feasibility study

Redmond

Lamé

et al. 2021

BMC Med Res Methodol

“…First, current practice is to identify RCTs through searches of bibliographic databases using highly sensitive RCT filters. Such filters have low precision, retrieving as many as 20 non-RCTs for every true RCT [12]. These irrelevant articles then need to be manually screened and removed.…”

Section: What Is the Implication And What Should Change Now?mentioning

confidence: 99%

“…The interlinked system or ''workflow'' is known as the Cochrane ''Evidence Pipeline.'' Here we describe the machine learning component of the Pipeline workflow; the other components (the Cochrane Crowd and a Centralised Search Service) are detailed elsewhere [12,13]. The reason that this is so beneficial for Cochrane Reviews is twofold.…”

Section: Introductionmentioning

confidence: 99%

Machine learning reduced workload with minimal risk of missing studies: development and evaluation of a randomized controlled trial classifier for Cochrane Reviews

Thomas

McDonald

Journal of Clinical Epidemiology

et al. 2021

128

Objectives: This study developed, calibrated, and evaluated a machine learning classifier designed to reduce study identification workload in Cochrane for producing systematic reviews.Methods: A machine learning classifier for retrieving randomized controlled trials (RCTs) was developed (the ''Cochrane RCT Classifier''), with the algorithm trained using a data set of titleeabstract records from Embase, manually labeled by the Cochrane Crowd. The classifier was then calibrated using a further data set of similar records manually labeled by the Clinical Hedges team, aiming for 99% recall. Finally, the recall of the calibrated classifier was evaluated using records of RCTs included in Cochrane Reviews that had abstracts of sufficient length to allow machine classification.Results: The Cochrane RCT Classifier was trained using 280,620 records (20,454 of which reported RCTs). A classification threshold was set using 49,025 calibration records (1,587 of which reported RCTs), and our bootstrap validation found the classifier had recall of 0.99 (95% confidence interval 0.98e0.99) and precision of 0.08 (95% confidence interval 0.06e0.12) in this data set. The final, calibrated RCT classifier correctly retrieved 43,783 (99.5%) of 44,007 RCTs included in Cochrane Reviews but missed 224 (0.5%). Older records were more likely to be missed than those more recently published.Conclusions: The Cochrane RCT Classifier can reduce manual study identification workload for Cochrane Reviews, with a very low and acceptable risk of missing eligible RCTs. This classifier now forms part of the Evidence Pipeline, an integrated workflow deployed within Cochrane to help improve the efficiency of the study identification processes that support systematic review production.

“…This stage tested the cut point by assessing the classifier's recalleits ability to correctly classify RCTs included in Cochrane reviews, as RCTs. Cochrane now uses the RCT classifier in its process to identify possible reports of RCTs as part of its Centralized Search Service initiative [6]. It is also used as part of the study identification process for individual Cochrane reviews through a workflow called Screen4Me (S4M) (described below).…”

Section: Rct Classifiermentioning

confidence: 99%

Citation screening using crowdsourcing and machine learning produced accurate results: Evaluation of Cochrane's modified Screen4Me service

Journal of Clinical Epidemiology

Dooley²,

Affengruber

et al. 2021

Objectives: To assess the feasibility of a modified workflow that uses machine learning and crowdsourcing to identify studies for potential inclusion in a systematic review. Study Design and Setting: This was a substudy to a larger randomized study; the main study sought to assess the performance of single screening search results versus dual screening. This substudy assessed the performance in identifying relevant randomized controlled trials (RCTs) for a published Cochrane review of a modified version of Cochrane's Screen4Me workflow which uses crowdsourcing and machine learning. We included participants who had signed up for the main study but who were not eligible to be randomized to the two main arms of that study. The records were put through the modified workflow where a machine learning classifier divided the data set into ''Not RCTs'' and ''Possible RCTs.'' The records deemed ''Possible RCTs'' were then loaded into a task created on the Cochrane Crowd platform, and participants classified those records as either ''Potentially relevant'' or ''Not relevant'' to the review. Using a prespecified agreement algorithm, we calculated the performance of the crowd in correctly identifying the studies that were included in the review (sensitivity) and correctly rejecting those that were not included (specificity). Results: The RCT machine learning classifier did not reject any of the included studies. In terms of the crowd, 112 participants were included in this substudy. Of these, 81 completed the training module and went on to screen records in the live task. Applying the Cochrane Crowd agreement algorithm, the crowd achieved 100% sensitivity and 80.71% specificity. Conclusions: Using a crowd to screen search results for systematic reviews can be an accurate method as long as the agreement algorithm in place is robust.