This study compares performance on free-response clinical reasoning examinations of first- and second-year medical students vs 2 models of a popular chatbot.
Introduction: New daily persistent headache (NDPH) is a primary headache disorder characterized by an intractable, daily, and unremitting headache lasting for at least 3 months. Currently, there are limited studies in the pediatric population describing the characteristics of NDPH. Objective: The objective of the current study is to describe the characteristics of NDPH in pediatric patients presenting to a headache program at a tertiary referral center. Methods: The participants in the current study were pediatric patients who attended the Headache Clinic at Children’s National Hospital between 2016 and 2018. All patients seen in the Headache Clinic were enrolled in an institutional review board–approved patient registry. Results: Between 2016 and 2018, NDPH was diagnosed in 245 patients, representing 14% of the total headache population. NDPH patients were predominantly female (78%) and white (72%). The median age was 14.8 years. The median pain intensity was 6 of 10 (standard deviation = 1.52). Most patients reported experiencing migrainous features, namely, photophobia (85%), phonophobia (85%), and a reduced activity level (88%). Overall, 33% of patients had failed at least 1 preventive medication, and 56% had failed at least 1 abortive medication. Furthermore, 36% of patients were additionally diagnosed with medication overuse headache. Conclusion: NDPH is a relatively frequent disorder among pediatric chronic headache patients. The vast majority of these patients experience migrainous headache characteristics and associated symptoms and are highly refractory to treatment—as evidenced by a strong predisposition to medication overuse headache and high rates of failed preventive management.
Importance: Studies show that ChatGPT, a general purpose large language model chatbot, could pass the multiple-choice US Medical Licensing Exams, but the model's performance on open-ended clinical reasoning is unknown. Objective: To determine if ChatGPT is capable of consistently meeting the passing threshold on free-response, case-based clinical reasoning assessments. Design: Fourteen multi-part cases were selected from clinical reasoning exams administered to pre-clerkship medical students between 2019 and 2022. For each case, the questions were run through ChatGPT twice and responses were recorded. Two clinician educators independently graded each run according to a standardized grading rubric. To further assess the degree of variation in ChatGPT's performance, we repeated the analysis on a single high-complexity case 20 times. Setting: A single US medical school Participants: ChatGPT Main Outcomes and Measures: Passing rate of ChatGPT's scored responses and the range in model performance across multiple run throughs of a single case. Results: 12 out of the 28 ChatGPT exam responses achieved a passing score (43%) with a mean score of 69% (95% CI: 65% to 73%) compared to the established passing threshold of 70%. When given the same case 20 separate times, ChatGPT's performance on that case varied with scores ranging from 56% to 81%. Conclusions and Relevance: ChatGPT's ability to achieve a passing performance in nearly half of the cases analyzed demonstrates the need to revise clinical reasoning assessments and incorporate artificial intelligence (AI)-related topics into medical curricula and practice.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.