Assessment of ChatGPT success with specialty medical knowledge using anaesthesiology board examination practice questions

Shay, Denys; Kumar, B. Anil; Bellamy, David; Palepu, Anil; Dershwitz, Mark; Walz, J. Matthias; Schaefer, Maximilian S.; Beam, Andrew L.

doi:10.1016/j.bja.2023.04.017

Cited by 31 publications

(7 citation statements)

References 3 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Similar findings were observed in other board certification examinations. 27 In general, highly specialized models with questions that are answerable by yes or no answers are more accurate, 36 whereas highly versatile models are less accurate. 37 An LLM is a multifunctional model that has not been trained in a specific domain; however, it is possible to fine-tune these models on specific tasks or domains to improve their performance in those areas.…”

Section: Discussionmentioning

confidence: 99%

“…A previous study on LLMs used similar methods, making the LLM answer the same questions twice and evaluating the agreement. 27 The primary outcome was the proportion of correct answers to the questions without images, under the same conditions as those encountered by examinees. Secondary outcomes included correct answers to all answerable questions, those with images, and those with stand-alone and scenario-based items.…”

Section: Llmmentioning

confidence: 99%

See 1 more Smart Citation

Performance of a Large Language Model on Japanese Emergency Medicine Board Certification Examinations

Igarashi,

Nakahara,

Norii

et al. 2024

J Nippon Med Sch

View full text Add to dashboard Cite

Background Emergency physicians need a broad range of knowledge and skills to address critical medical, traumatic, and environmental conditions.Artificial intelligence (AI), including large language models (LLMs), has potential applications in healthcare settings; however, the performance of LLMs in emergency medicine remains unclear. MethodsTo evaluate the reliability of information provided by ChatGPT, an LLM was given the questions set by the Japanese Association of Acute Medicine in its board certification examinations over a period of 5 years (2018)(2019)(2020)(2021)(2022) and programmed to answer them twice. Statistical analysis was used to assess agreement of the two responses. ResultsThe LLM successfully answered 465 of the 475 text-based questions, achieving an overall correct response rate of 62.3%. For questions without images, the rate of correct answers was 65.9%. For questions with images that were not explained to the LLM, the rate of correct answers was only 52.0%. The annual rates of correct answers to questions without images ranged from 56.3% to 78.8%. Accuracy was better for scenario-based questions (69.1%) than for stand-alone questions (62.1%). Agreement between the two responses was substantial (kappa = 0.70). Factual error accounted for 82% of the incorrectly answered questions. ConclusionAn LLM performed satisfactorily on an emergency medicine board certification examination in Japanese and without images. However, factual errors in the responses highlight the need for physician oversight when using LLMs.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Llmmentioning

confidence: 99%

Performance of a Large Language Model on Japanese Emergency Medicine Board Certification Examinations

Igarashi,

Nakahara,

Norii

et al. 2024

J Nippon Med Sch

View full text Add to dashboard Cite

show abstract

“…Several studies focused on ChatGPT's performance in medical knowledge tests, including licensing examinations for physicians, anesthesia, ophthalmology, neurology, and other specialty examinations [31][32][33][34]. Overall, ChatGPT demonstrated passing scores in most countries' licensing and specialty exams, but generally scored only slightly above the passing line, and did not achieve accuracy rates above 95% in any licensing exam.…”

Section: Medical Exam Performance and Exam Preparation With Chatgptmentioning

confidence: 99%

Opportunities, challenges, and future directions of large language models, including ChatGPT in medical education: a systematic scoping review

Xu,

Chen,

Miao

2024

J Educ Eval Health Prof

View full text Add to dashboard Cite

Background: ChatGPT is a large language model (LLM) based on artificial intelligence (AI) capable of responding in multiple languages and generating nuanced and highly complex responses. While ChatGPT holds promising applications in medical education, its limitations and potential risks cannot be ignored.Methods: A scoping review was conducted for English articles discussing ChatGPT in the context of medical education published after 2022. A literature search was performed using PubMed/MEDLINE, Embase, and Web of Science databases, and information was extracted from the relevant studies that were ultimately included.Results: ChatGPT exhibits various potential applications in medical education, such as providing personalized learning plans and materials, creating clinical practice simulation scenarios, and assisting in writing articles. However, challenges associated with academic integrity, data accuracy, and potential harm to learning were also highlighted in the literature. The paper emphasizes certain recommendations for using ChatGPT, including the establishment of guidelines. Based on the review, 3 key research areas were proposed: cultivating the ability of medical students to use ChatGPT correctly, integrating ChatGPT into teaching activities and processes, and proposing standards for the use of AI by medical students.Conclusion: ChatGPT has the potential to transform medical education, but careful consideration is required for its full integration. To harness the full potential of ChatGPT in medical education, attention should not only be given to the capabilities of AI but also to its impact on students and teachers.

show abstract

“…Although the sample size is insufficient to generalize the findings to other fields of expertise or users, it did provide a paradigm of ChatGPT-assisted training in new knowledge and techniques. We can foresee the flourishing development of AI chatbots being applied in medicine or health care [28][29][30][31][32][33][34][35][36]. ChatGPT represents a paradigm shift in the field of virtual assistants.…”

Section: Principal Findingsmentioning

confidence: 99%

Health Care Trainees’ and Professionals’ Perceptions of ChatGPT in Improving Medical Knowledge Training: Rapid Survey Study

Hu,

Liu,

Chu

et al. 2023

J Med Internet Res

View full text Add to dashboard Cite

Background ChatGPT is a powerful pretrained large language model. It has both demonstrated potential and raised concerns related to knowledge translation and knowledge transfer. To apply and improve knowledge transfer in the real world, it is essential to assess the perceptions and acceptance of the users of ChatGPT-assisted training. Objective We aimed to investigate the perceptions of health care trainees and professionals on ChatGPT-assisted training, using biomedical informatics as an example. Methods We used purposeful sampling to include all health care undergraduate trainees and graduate professionals (n=195) from January to May 2023 in the School of Public Health at the National Defense Medical Center in Taiwan. Subjects were asked to watch a 2-minute video introducing 5 scenarios about ChatGPT-assisted training in biomedical informatics and then answer a self-designed online (web- and mobile-based) questionnaire according to the Kirkpatrick model. The survey responses were used to develop 4 constructs: “perceived knowledge acquisition,” “perceived training motivation,” “perceived training satisfaction,” and “perceived training effectiveness.” The study used structural equation modeling (SEM) to evaluate and test the structural model and hypotheses. Results The online questionnaire response rate was 152 of 195 (78%); 88 of 152 participants (58%) were undergraduate trainees and 90 of 152 participants (59%) were women. The ages ranged from 18 to 53 years (mean 23.3, SD 6.0 years). There was no statistical difference in perceptions of training evaluation between men and women. Most participants were enthusiastic about the ChatGPT-assisted training, while the graduate professionals were more enthusiastic than undergraduate trainees. Nevertheless, some concerns were raised about potential cheating on training assessment. The average scores for knowledge acquisition, training motivation, training satisfaction, and training effectiveness were 3.84 (SD 0.80), 3.76 (SD 0.93), 3.75 (SD 0.87), and 3.72 (SD 0.91), respectively (Likert scale 1-5: strongly disagree to strongly agree). Knowledge acquisition had the highest score and training effectiveness the lowest. In the SEM results, training effectiveness was influenced predominantly by knowledge acquisition and partially met the hypotheses in the research framework. Knowledge acquisition had a direct effect on training effectiveness, training satisfaction, and training motivation, with β coefficients of .80, .87, and .97, respectively (all P<.001). Conclusions Most health care trainees and professionals perceived ChatGPT-assisted training as an aid in knowledge transfer. However, to improve training effectiveness, it should be combined with empirical experts for proper guidance and dual interaction. In a future study, we recommend using a larger sample size for evaluation of internet-connected large language models in medical knowledge transfer.

show abstract

Assessment of ChatGPT success with specialty medical knowledge using anaesthesiology board examination practice questions

Cited by 31 publications

References 3 publications

Performance of a Large Language Model on Japanese Emergency Medicine Board Certification Examinations

Performance of a Large Language Model on Japanese Emergency Medicine Board Certification Examinations

Opportunities, challenges, and future directions of large language models, including ChatGPT in medical education: a systematic scoping review

Health Care Trainees’ and Professionals’ Perceptions of ChatGPT in Improving Medical Knowledge Training: Rapid Survey Study

Contact Info

Product

Resources

About