Purpose
To conduct a prospective observational study across 12 U.S. hospitals to
evaluate real-time performance of an interpretable artificial
intelligence (AI) model to detect COVID-19 on chest radiographs.
Materials and Methods
A total of 95 363 chest radiographs were included in model
training, external validation, and real-time validation. The model was
deployed as a clinical decision support system, and performance was
prospectively evaluated. There were 5335 total real-time predictions and
a COVID-19 prevalence of 4.8% (258 of 5335). Model performance was
assessed with use of receiver operating characteristic analysis,
precision-recall curves, and F1 score. Logistic regression was used to
evaluate the association of race and sex with AI model diagnostic
accuracy. To compare model accuracy with the performance of
board-certified radiologists, a third dataset of 1638 images was read
independently by two radiologists.
Results
Participants positive for COVID-19 had higher COVID-19 diagnostic scores
than participants negative for COVID-19 (median, 0.1 [IQR,
0.0–0.8] vs 0.0 [IQR, 0.0–0.1], respectively;
P
< .001). Real-time model performance was
unchanged over 19 weeks of implementation (area under the receiver
operating characteristic curve, 0.70; 95% CI: 0.66, 0.73). Model
sensitivity was higher in men than women (
P
= .01),
whereas model specificity was higher in women (
P
=
.001). Sensitivity was higher for Asian (
P
= .002) and
Black (
P
= .046) participants compared with White
participants. The COVID-19 AI diagnostic system had worse accuracy
(63.5% correct) compared with radiologist predictions (radiologist 1 =
67.8% correct, radiologist 2 = 68.6% correct; McNemar
P
< .001 for both).
Conclusion
AI-based tools have not yet reached full diagnostic potential for
COVID-19 and underperform compared with radiologist prediction.
Keywords:
Diagnosis, Classification, Application Domain,
Infection, Lung
Supplemental material is available for this
article.
.
© RSNA, 2022