Background
Systematic reviews (SRs) are often cited as the highest level of evidence available as they involve the identification and synthesis of published studies on a topic. Unfortunately, it is increasingly challenging for small teams to complete SR procedures in a reasonable time period, given the exponential rise in the volume of primary literature. Crowdsourcing has been postulated as a potential solution.
Objective
The feasibility objective of this study was to determine whether a crowd would be willing to perform and complete abstract and full text screening. The validation objective was to assess the quality of the crowd’s work, including retention of eligible citations (sensitivity) and work performed for the investigative team, defined as the percentage of citations excluded by the crowd.
Methods
We performed a prospective study evaluating crowdsourcing essential components of an SR, including abstract screening, document retrieval, and full text assessment. Using CrowdScreenSR citation screening software, 2323 articles from 6 SRs were available to an online crowd. Citations excluded by less than or equal to 75% of the crowd were moved forward for full text assessment. For the validation component, performance of the crowd was compared with citation review through the accepted, gold standard, trained expert approach.
Results
Of 312 potential crowd members, 117 (37.5%) commenced abstract screening and 71 (22.8%) completed the minimum requirement of 50 citation assessments. The majority of participants were undergraduate or medical students (192/312, 61.5%). The crowd screened 16,988 abstracts (median: 8 per citation; interquartile range [IQR] 7-8), and all citations achieved the minimum of 4 assessments after a median of 42 days (IQR 26-67). Crowd members retrieved 83.5% (774/927) of the articles that progressed to the full text phase. A total of 7604 full text assessments were completed (median: 7 per citation; IQR 3-11). Citations from all but 1 review achieved the minimum of 4 assessments after a median of 36 days (IQR 24-70), with 1 review remaining incomplete after 3 months. When complete crowd member agreement at both levels was required for exclusion, sensitivity was 100% (95% CI 97.9-100) and work performed was calculated at 68.3% (95% CI 66.4-70.1). Using the predefined alternative 75% exclusion threshold, sensitivity remained 100% and work performed increased to 72.9% (95% CI 71.0-74.6;
P
<.001). Finally, when a simple majority threshold was considered, sensitivity decreased marginally to 98.9% (95% CI 96.0-99.7;
P
=.25) and work performed increased substantially to 80.4% (95% CI 78.7-82.0;
P
<.001).
Conclusions
Crowdsourcing of citation screening for SRs is feasible and has reasonable sensitivity and specificity. By expediting the screening process, crowdsourcing could permit the i...