Background: Evidence from new health technologies is growing, along with demands for evidence to inform policy decisions, creating challenges in completing health technology assessments (HTAs)/systematic reviews (SRs) in a timely manner. Software can decrease the time and burden by automating the process, but evidence validating such software is limited. We tested the accuracy of RobotReviewer, a semi-autonomous risk of bias (RoB) assessment tool, and its agreement with human reviewers.Methods: Two reviewers independently conducted RoB assessments on a sample of randomized controlled trials (RCTs), and their consensus ratings were compared with those generated by RobotReviewer. Agreement with the human reviewers was assessed using percent agreement and weighted kappa (κ). The accuracy of RobotReviewer was also assessed by calculating the sensitivity, specificity, and area under the curve in comparison to the consensus agreement of the human reviewers.
Results:The study included 372 RCTs. Inter-rater reliability ranged from κ = −0.06 (no agreement) for blinding of participants and personnel to κ = 0.62 (good agreement) for random sequence generation (excluding overall RoB).RobotReviewer was found to use a high percentage of "irrelevant supporting quotations" to complement RoB assessments for blinding of participants and personnel (72.6%), blinding of outcome assessment (70.4%), and allocation concealment (54.3%).
Conclusion:RobotReviewer can help with risk of bias assessment of RCTs but cannot replace human evaluations. Thus, reviewers should check and validate RoB assessments from RobotReviewer by consulting the original article when not relevant supporting quotations are provided by RobotReviewer. This consultation is in line with the recommendation provided by the developers.
K E Y W O R D Sartificial intelligence, health technology assessment (HTA), inter-rater reliability, randomized controlled trial, risk of bias, systematic review
| INTRODUCTIONKnowledge synthesis products such as health technology assessments (HTAs) and systematic reviews (SRs) are important sources of high-quality information for policy makers, and the production of these assessments and reports requires expertise, human resources, and time. 1 Given the proliferation of health technologies and