Purpose
To assess the diagnostic accuracy of ChatGPT-4V in interpreting a set of four chest CT slices for each case of COVID-19, non-small cell lung cancer (NSCLC), and control cases, thereby evaluating its potential as an AI tool in radiological diagnostics.
Materials and methods
In this retrospective study, 60 CT scans from The Cancer Imaging Archive, covering COVID-19, NSCLC, and control cases were analyzed using ChatGPT-4V. A radiologist selected four CT slices from each scan for evaluation. ChatGPT-4V’s interpretations were compared against the gold standard diagnoses and assessed by two radiologists. Statistical analyses focused on accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV), along with an examination of the impact of pathology location and lobe involvement.
Results
ChatGPT-4V showed an overall diagnostic accuracy of 56.76%. For NSCLC, sensitivity was 27.27% and specificity was 60.47%. In COVID-19 detection, sensitivity was 13.64% and specificity of 64.29%. For control cases, the sensitivity was 31.82%, with a specificity of 95.24%. The highest sensitivity (83.33%) was observed in cases involving all lung lobes. The chi-squared statistical analysis indicated significant differences in Sensitivity across categories and in relation to the location and lobar involvement of pathologies.
Conclusion
ChatGPT-4V demonstrated variable diagnostic performance in chest CT interpretation, with notable proficiency in specific scenarios. This underscores the challenges of cross-modal AI models like ChatGPT-4V in radiology, pointing toward significant areas for improvement to ensure dependability. The study emphasizes the importance of enhancing these models for broader, more reliable medical use.