Search citation statements
Paper Sections
Citation Types
Year Published
Publication Types
Relationship
Authors
Journals
BACKGROUND Case studies have shown ChatGPT can run clinical simulations at the medical student level. However, no data have assessed ChatGPT’s reliability in meeting desired simulation criteria such as medical accuracy, simulation formatting, and robust feedback mechanisms. OBJECTIVE To quantify ChatGPT’s ability to consistently follow formatting instructions and create simulations for preclinical medical student learners according to principles of medical simulation and multimedia educational technology. METHODS Using ChatGPT-4 and a pre-validated starting prompt, the authors ran 360 separate simulations of an acute asthma exacerbation. 180 simulations were given correct answers and 180 were given incorrect answers. ChatGPT was evaluated for its ability to adhere to basic simulation parameters (stepwise progression, free response, interactivity), advanced simulation parameters (autonomous conclusion, delayed feedback, comprehensive feedback), and medical accuracy (vignette, treatment updates, feedback). Significance was determined with chi-squared analyses using 95% confidence intervals for odds ratios. RESULTS 100% of simulations met basic simulation parameters and were medically accurate. For advanced parameters, 55% of all simulations delayed feedback, while the Correct arm (87%) delayed feedback significantly more than the Incorrect arm (24%) (p<0.001). 79% of simulations concluded autonomously, and there was no difference between the Correct and Incorrect arms in autonomous conclusion (81%, 77%; p=0.364). 78% of simulations gave comprehensive feedback, and there was no difference between the Correct and Incorrect arms in comprehensive feedback (76%, 81%; p=0.306). ChatGPT-4 was significantly more likely to conclude simulations autonomously (p<0.001) and provide comprehensive feedback (p<0.001) when feedback was delayed compared to when feedback was not delayed. CONCLUSIONS ChatGPT simulations performed perfectly on medical accuracy and basic simulation parameters. It performed well on comprehensive feedback and autonomous conclusion. Delayed feedback depended on the accuracy of user inputs. A simulation meeting one advanced parameter was more likely to meet all advanced parameters. These simulations have the potential to be a reliable educational tool for simple simulations and can be evaluated by a novel nine-part metric. Further work must be done to ensure consistent performance across a broader range of simulation scenarios.
BACKGROUND Case studies have shown ChatGPT can run clinical simulations at the medical student level. However, no data have assessed ChatGPT’s reliability in meeting desired simulation criteria such as medical accuracy, simulation formatting, and robust feedback mechanisms. OBJECTIVE To quantify ChatGPT’s ability to consistently follow formatting instructions and create simulations for preclinical medical student learners according to principles of medical simulation and multimedia educational technology. METHODS Using ChatGPT-4 and a pre-validated starting prompt, the authors ran 360 separate simulations of an acute asthma exacerbation. 180 simulations were given correct answers and 180 were given incorrect answers. ChatGPT was evaluated for its ability to adhere to basic simulation parameters (stepwise progression, free response, interactivity), advanced simulation parameters (autonomous conclusion, delayed feedback, comprehensive feedback), and medical accuracy (vignette, treatment updates, feedback). Significance was determined with chi-squared analyses using 95% confidence intervals for odds ratios. RESULTS 100% of simulations met basic simulation parameters and were medically accurate. For advanced parameters, 55% of all simulations delayed feedback, while the Correct arm (87%) delayed feedback significantly more than the Incorrect arm (24%) (p<0.001). 79% of simulations concluded autonomously, and there was no difference between the Correct and Incorrect arms in autonomous conclusion (81%, 77%; p=0.364). 78% of simulations gave comprehensive feedback, and there was no difference between the Correct and Incorrect arms in comprehensive feedback (76%, 81%; p=0.306). ChatGPT-4 was significantly more likely to conclude simulations autonomously (p<0.001) and provide comprehensive feedback (p<0.001) when feedback was delayed compared to when feedback was not delayed. CONCLUSIONS ChatGPT simulations performed perfectly on medical accuracy and basic simulation parameters. It performed well on comprehensive feedback and autonomous conclusion. Delayed feedback depended on the accuracy of user inputs. A simulation meeting one advanced parameter was more likely to meet all advanced parameters. These simulations have the potential to be a reliable educational tool for simple simulations and can be evaluated by a novel nine-part metric. Further work must be done to ensure consistent performance across a broader range of simulation scenarios.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.