PURPOSE:
Artificial intelligence (AI) offers considerable promise for retinopathy of prematurity (ROP) screening and diagnosis. The development of deep-learning algorithms to detect the presence of disease may contribute to sufficient screening, early detection, and timely treatment for this preventable blinding disease. This review aimed to systematically examine the literature in AI algorithms in detecting ROP. Specifically, we focused on the performance of deep-learning algorithms through sensitivity, specificity, and area under the receiver operating curve (AUROC) for both the detection and grade of ROP.
METHODS:
We searched Medline OVID, PubMed, Web of Science, and Embase for studies published from January 1, 2012, to September 20, 2021. Studies evaluating the diagnostic performance of deep-learning models based on retinal fundus images with expert ophthalmologists' judgment as reference standard were included. Studies which did not investigate the presence or absence of disease were excluded. Risk of bias was assessed using the QUADAS-2 tool.
RESULTS:
Twelve studies out of the 175 studies identified were included. Five studies measured the performance of detecting the presence of ROP and seven studies determined the presence of plus disease. The average AUROC out of 11 studies was 0.98. The average sensitivity and specificity for detecting ROP was 95.72% and 98.15%, respectively, and for detecting plus disease was 91.13% and 95.92%, respectively.
CONCLUSION:
The diagnostic performance of deep-learning algorithms in published studies was high. Few studies presented externally validated results or compared performance to expert human graders. Large scale prospective validation alongside robust study design could improve future studies.