Background and hypothesis
In November 2022, OpenAI released a chatbot named ChatGPT, a product capable of processing natural language to create human-like conversational dialogue. It has generated a lot of interest, including from the scientific community as well as the medical science community. Recent publications have shown that ChatGPT can correctly answer questions from medical exams such as the United States Medical Licensing Examination (USMLE) and other specialty exams. To date, there have been no studies in which ChatGPT has been tested on specialty questions in the field of nephrology anywhere in the world.
Methods
Using the ChatGPT-3.5 and 4.0 algorithm in this comparative cross-sectional study, we analyzed 1560 single-answer questions from the national specialty exam in nephrology from 2017 to 2023 that were available in the Polish Medical Examination Center's question database along with answer keys.
Results
Of the 1556 questions posed to ChatGPT-4.0, correct answers were obtained with an accuracy of 69.84%, compared to ChatGPT-3.5 (45.70%, P = .0001) and to the top results of medical doctors (85.73%, P = .0001). Of the 13 tests, ChatGPT-4.0 exceeded the required ≥60% pass rate in 11 tests passed, and scored higher than the average of the human exam results.
Conclusion
ChatGPT-3.5 was not spectacularly successful in nephrology exams. The ChatGPT-4.0 algorithm was able to pass most of the analyzed nephrology specialty exams. New generations of ChatGPT achieve similar results to humans. The best results of humans are better than ChatGPT-4.0.