Lessons learned from a novel 3-year longitudinal stepwise “Residents-as-Teachers” program

Liang, Jen-Feng; Cheng, Hao‐Min; Huang, Chia Chang; Yang, Ying-Ying; Chen, Chen‐Huan

doi:10.1097/jcma.0000000000000928

Cited by 5 publications

References 43 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

Artificial Intelligence in Periodontology: Performance Evaluation of ChatGPT, Claude, and Gemini on the In-service Examination

Ahmad,

Saleh,

Alharbi

et al. 2024

Preprint

View full text Add to dashboard Cite

BackgroundArtificial intelligence (AI) language models have shown potential as educational tools in healthcare, but their accuracy and reliability in periodontology education require further evaluation. In this study we aimed to assess and compare the performance of three prominent AI language models—ChatGPT-4o, Claude 3 Opus, and Gemini Advanced—with second-year periodontics residents across the United States on the American Academy of Periodontology 2024 in-service examination.MethodsWe conducted a cross-sectional study using 331 multiple-choice questions from the 2024 periodontology in-service examination. We evaluated and compared the performances of ChatGPT-4o, Claude 3 Opus, and Gemini Advanced across various question domains. The results of second-year periodontics residents served as a benchmark.ResultsChatGPT-4o, Gemini Advanced, and Claude 3 Opus significantly outperformed second-year periodontics residents across the United States, with accuracy rates of 92.7 percent, 81.6 percent, and 78.5 percent, respectively, compared to the residents’ 61.9 percent. The differences in performance among the AI models were statistically significant (p< 0.001). Percentile rankings underscored the superior performance of the AI models, with ChatGPT-4o, Gemini Advanced, and Claude 3 Opus placing in the 99.95th, 98th, and 95th percentiles, respectively.ConclusionChatGPT-4o displayed superior performance compared to Claude 3 Opus and Gemini Advanced. The results highlight the potential of AI large language models (LLMs) as educational tools in periodontology and emphasize the need for ongoing evaluation and validation as these technologies evolve. Researchers should explore both the integration of AI language models into periodontal education and their impact on learning outcomes and clinical decision-making.

show abstract

Artificial Intelligence in Periodontology: Performance Evaluation of ChatGPT, Claude, and Gemini on the In-service Examination

Ahmad,

Saleh,

Alharbi

et al. 2024

Preprint

View full text Add to dashboard Cite

show abstract

ChatGPT takes the FCPS exam in Internal Medicine

Qazi,

Ali,

Irfan

et al. 2024

Preprint

View full text Add to dashboard Cite

Large language models (LLMs) have exhibited remarkable proficiency in clinical knowledge, encompassing diagnostic medicine, and have been tested on questions related to medical licensing examinations. ChatGPT has recently gained popularity because of its ability to generate human-like responses when presented with exam questions. It has been tested on multiple undergraduate and subspecialty exams and the results have been mixed. We aim to test ChatGPT on questions mirroring the standards of the FCPS exam, the highest medical qualification in Pakistan.We used 111 randomly chosen MCQs of internal medicine of FCPS level in the form of a text prompt, thrice on 3 consecutive days. The average of the three answers was taken as the final response. The responses were recorded and compared to the answers given by subject experts. Agreement between the two was assessed using the Chi-square test and Cohen’s Kappa with 0.75 Kappa as an acceptable agreement. Univariate regression analysis was done for the effect of subspeciality, word count, and case scenarios in the success of ChatGPT.. Post-risk stratification chi-square and kappa statistics were applied.ChatGPT 4.0 scored 73% (69%-74%). Although close to the passing criteria, it could not clear the FCPS exam. Question characteristics and subspecialties did not affect the ChatGPT responses statistically. ChatGPT shows a high concordance between its responses indicating sound knowledge and a high reliability.This study’s findings underline the necessity for caution in over-reliance on AI for critical clinical decisions without human oversight. Creating specialized models tailored for medical education could provide a viable solution to this problem.Author SummaryArtificial intelligence is the future of the world. Since the launch of ChatGPT in 2014, it become one of the most widely used application for people in all fields of life. A wave of excitement was felt among the medical community when the chatbot was announced to have cleared the USMLE exams. Here, we have tested ChatGPT on MCQs mirroring the standard of FCPS exam questions. The FCPS is the highest medical qualification in Pakistan. We found that with a vast data base, ChatGPT could not clear the exam in all of the three attempts taken by it. ChatGPT, however, scored a near passing score indicating a relatively sound knowledge.We found ChatGPT to be a consistent LLM for complex medical scenarios faced by doctors in their daily lives irrespective of the subspecialty, length or word count of the questions. Although ChatGPT did not pass the FCPS exam, its answers displayed a high level of consistency, indicating a solid understanding of internal medicine. This demonstrates the potential of AI to support and improve medical education and healthcare services in near future.

show abstract

A comparative analysis of the performance of chatGPT4, Gemini and Claude for the Polish Medical Final Diploma Exam and Medical-Dental Verification Exam

Wójcik,

Adamiak,

Czerepak

et al. 2024

Preprint

View full text Add to dashboard Cite

In the realm of medical education, the utility of chatbots is being explored with growing interest. One pertinent area of investigation is the performance of these models on standardized medical examinations, which are crucial for certifying the knowledge and readiness of healthcare professionals. In Poland, dental and medical students have to pass crucial exams known as LDEK (Medical-Dental Final Examination) and LEK (Medical Final Examination) exams respectively. The primary objective of this study was to conduct a comparative analysis of chatbots: ChatGPT-4, Gemini and Claude to evaluate their accuracy in answering exam questions of the LDEK and the Medical-Dental Verification Examination (LDEW), using queries in both English and Polish. The analysis of Model 2, which compared chatbots within question groups, showed that the chatbot Claude achieved the highest probability of accuracy for all question groups except the area of prosthetic dentistry compared to ChatGPT-4 and Gemini. In addition, the probability of a correct answer to questions in the field of integrated medicine is higher than in the field of dentistry for all chatbots in both prompt languages. Our results demonstrate that Claude achieved the highest accuracy in all areas analysed and outperformed other chatbots. This suggests that Claude has significant potential to support the medical education of dental students. This study showed that the performance of chatbots varied depending on the prompt language and the specific field. This highlights the importance of considering language and specialty when selecting a chatbot for educational purposes.

show abstract

Lessons learned from a novel 3-year longitudinal stepwise “Residents-as-Teachers” program

Cited by 5 publications

References 43 publications

Artificial Intelligence in Periodontology: Performance Evaluation of ChatGPT, Claude, and Gemini on the In-service Examination

Artificial Intelligence in Periodontology: Performance Evaluation of ChatGPT, Claude, and Gemini on the In-service Examination

ChatGPT takes the FCPS exam in Internal Medicine

A comparative analysis of the performance of chatGPT4, Gemini and Claude for the Polish Medical Final Diploma Exam and Medical-Dental Verification Exam

Contact Info

Product

Resources

About