Introduction:
In this retrospective cohort study, we wanted to evaluate the performance and analyze the insights of an artificial intelligence (AI) algorithm in detecting retinal fluid in spectral-domain OCT volume scans from a large cohort of patients with neovascular age-related macular degeneration (AMD) and diabetic macular edema (DME).
Methods:
A total of 3’981 OCT volumes from 374 patients with AMD and 11’501 OCT volumes from 811 patients with DME, acquired with Heidelberg Spectralis OCT device (Heidelberg Engineering Inc., Heidelberg, Germany) between 2013 and 2021. Each OCT volume was annotated for the presence or absence of intraretinal fluid (IRF) and subretinal fluid (SRF) by masked reading center graders (ground truth). The performance of an already published AI-algorithm to detect IRF, SRF separately and a combined fluid detector (IRF and/or SRF) of the same OCT volumes was evaluated. An analysis of the sources of disagreement between annotation and prediction and their relationship to central retinal thickness was performed. We computed the mean areas under the curves (AUC) and under the precision-recall curves (AP), accuracy, sensitivity, specificity and precision.
Results:
The AUC for IRF was 0.92 and 0.98, for SRF 0.98 and 0.99, in the AMD and DME cohort, respectively. The AP for IRF was 0.89 and 1.00, for SRF 0.97 and 0.93, in the AMD and DME cohort, respectively. The accuracy, specificity and sensitivity for IRF was 0.87, 0.88, 0.84, and 0.93, 0.95, 0.93, and for SRF 0.93, 0.93, 0.93, and 0.95, 0.95, 0.95 in the AMD and DME cohort respectively. For detecting any fluid, the AUC was 0.95 and 0.98, the accuracy, specificity and sensitivity was 0.89, 0.93, 0.90 and 0.95, 0.88 and 0.93, in the AMD and DME cohort, respectively. False positives were present when retinal shadow artifacts and strong retinal deformation were present. False negatives were due to small hyporeflective areas in combination with poor image quality. The combined detector correctly predicted more OCT volumes than the single detectors for IRF and SRF, 89.0% versus 81.6% in the AMD and 93.1% versus 88.6% in the DME cohort.
Discussion/Conclusion:
The AI-based fluid detector achieves high performance for retinal fluid detection in a very large dataset dedicated to AMD and DME. Combining single detectors provides better fluid detection accuracy than considering the single detectors separately. The observed independence of the single detectors ensures that the detectors learned features particular to IRF and SRF.