Objective:
Artificial intelligence has recently become available for widespread use in medicine, including the interpretation of digitized information, big data for tracking disease trends and patterns, and clinical diagnosis. Comparative studies and expert opinion support the validity of imaging and data analysis, yet similar validation is lacking in clinical diagnosis. Artificial intelligence programs are here compared with a diagnostic generator program in clinical neurology.
Methods:
Using 4 nonrandomly selected case records from New England Journal of Medicine clinicopathologic conferences from 2017 to 2022, 2 artificial intelligence programs (ChatGPT-4 and GLASS AI) were compared with a neurological diagnostic generator program (NeurologicDx.com) for diagnostic capability and accuracy and source authentication.
Results:
Compared with NeurologicDx.com, the 2 AI programs showed results varying with order of key term entry and with repeat querying. The diagnostic generator yielded more differential diagnostic entities, with correct diagnoses in 4 of 4 test cases versus 0 of 4 for ChatGPT-4 and 1 of 4 for GLASS AI, respectively, and with authentication of diagnostic entities compared with the AI programs.
Conclusions:
The diagnostic generator NeurologicDx yielded a more robust and reproducible differential diagnostic list with higher diagnostic accuracy and associated authentication compared with artificial intelligence programs.