AI assistance improved radiologists' performance in distinguishing COVID-19 from pneumonia of other etiology on chest CT.
Key Results: An AI model had higher test accuracy (96% vs 85%, p<0.001), sensitivity (95% vs 79%, p<0.001), and specificity (96% vs 88%, p=0.002) than radiologists. In an independent test set, our AI model achieved an accuracy of 87%, sensitivity of 89% and specificity of 86%. With AI assistance, the radiologists achieved a higher average accuracy (90% vs 85%, p<0.001), sensitivity (88% vs 79%, p<0.001) and specificity (91% vs 88%, p=0.001).
AbstractBackground: COVID-19 and pneumonia of other etiology share similar CT characteristics, contributing to the challenges in differentiating them with high accuracy.Purpose: To establish and evaluate an artificial intelligence (AI) system in differentiating COVID-19 and other pneumonia on chest CT and assess radiologist performance without and with AI assistance.Methods: 521 patients with positive RT-PCR for COVID-19 and abnormal chest CT findings were retrospectively identified from ten hospitals from January 2020 to April 2020. 665 patients with non-COVID-19 pneumonia and definite evidence of pneumonia on chest CT were retrospectively selected from three hospitals between 2017 and 2019. To classify COVID-19 versus other pneumonia for each patient, abnormal CT slices were input into the EfficientNet B4 deep neural network architecture after lung segmentation, followed by two-layer fully-connected neural network to pool slices together.Our final cohort of 1,186 patients (132,583 CT slices) was divided into training, validation and test sets in a 7:2:1 and equal ratio. Independent testing was performed by evaluating model performance on separate hospitals. Studies were blindly reviewed by six radiologists without and then with AI assistance.Results: Our final model achieved a test accuracy of 96% (95% CI: 90-98%), sensitivity 95% (95% CI: 83-100%) and specificity of 96% (95% CI: 88-99%) with Receiver Operating Characteristic (ROC) AUC of 0.95 and Precision-Recall (PR) AUC of 0.90. On independent testing, our model achieved an accuracy of 87% (95% CI: 82-90%), sensitivity of 89% (95% CI: 81-94%) and specificity of 86% (95% CI: 80-90%) with ROC AUC of 0.90 and PR AUC of 0.87. Assisted by the models' probabilities, the radiologists achieved a higher average test accuracy (90% vs. 85%, ∆=5, p<0.001), sensitivity (88% vs. 79%, ∆=9, p<0.001) and specificity (91% vs. 88%, ∆=3, p=0.001).