Search citation statements
Paper Sections
Citation Types
Year Published
Publication Types
Relationship
Authors
Journals
ImportanceLarge language model (LLM) artificial intelligence (AI) systems have shown promise in diagnostic reasoning, but their utility in management reasoning with no clear right answers is unknown.ObjectiveTo determine whether LLM assistance improves physician performance on open-ended management reasoning tasks compared to conventional resources.DesignProspective, randomized controlled trial conducted from 30 November 2023 to 21 April 2024.SettingMulti-institutional study from Stanford University, Beth Israel Deaconess Medical Center, and the University of Virginia involving physicians from across the United States.Participants92 practicing attending physicians and residents with training in internal medicine, family medicine, or emergency medicine.InterventionFive expert-developed clinical case vignettes were presented with multiple open-ended management questions and scoring rubrics created through a Delphi process. Physicians were randomized to use either GPT-4 via ChatGPT Plus in addition to conventional resources (e.g., UpToDate, Google), or conventional resources alone.Main Outcomes and MeasuresThe primary outcome was difference in total score between groups on expert-developed scoring rubrics. Secondary outcomes included domain-specific scores and time spent per case.ResultsPhysicians using the LLM scored higher compared to those using conventional resources (mean difference 6.5 %, 95% CI 2.7-10.2, p<0.001). Significant improvements were seen in management decisions (6.1%, 95% CI 2.5-9.7, p=0.001), diagnostic decisions (12.1%, 95% CI 3.1-21.0, p=0.009), and case-specific (6.2%, 95% CI 2.4-9.9, p=0.002) domains. GPT-4 users spent more time per case (mean difference 119.3 seconds, 95% CI 17.4-221.2, p=0.02). There was no significant difference between GPT-4-augmented physicians and GPT-4 alone (-0.9%, 95% CI -9.0 to 7.2, p=0.8).Conclusions and RelevanceLLM assistance improved physician management reasoning compared to conventional resources, with particular gains in contextual and patient-specific decision-making. These findings indicate that LLMs can augment management decision-making in complex cases.Trial RegistrationClinicalTrials.govIdentifier:NCT06208423;https://classic.clinicaltrials.gov/ct2/show/NCT06208423Key PointsQuestionDoes large language model (LLM) assistance improve physician performance on complex management reasoning tasks compared to conventional resources?FindingsIn this randomized controlled trial of 92 physicians, participants using GPT-4 achieved higher scores on management reasoning compared to those using conventional resources (e.g., UpToDate).MeaningLLM assistance enhances physician management reasoning performance in complex cases with no clear right answers.
ImportanceLarge language model (LLM) artificial intelligence (AI) systems have shown promise in diagnostic reasoning, but their utility in management reasoning with no clear right answers is unknown.ObjectiveTo determine whether LLM assistance improves physician performance on open-ended management reasoning tasks compared to conventional resources.DesignProspective, randomized controlled trial conducted from 30 November 2023 to 21 April 2024.SettingMulti-institutional study from Stanford University, Beth Israel Deaconess Medical Center, and the University of Virginia involving physicians from across the United States.Participants92 practicing attending physicians and residents with training in internal medicine, family medicine, or emergency medicine.InterventionFive expert-developed clinical case vignettes were presented with multiple open-ended management questions and scoring rubrics created through a Delphi process. Physicians were randomized to use either GPT-4 via ChatGPT Plus in addition to conventional resources (e.g., UpToDate, Google), or conventional resources alone.Main Outcomes and MeasuresThe primary outcome was difference in total score between groups on expert-developed scoring rubrics. Secondary outcomes included domain-specific scores and time spent per case.ResultsPhysicians using the LLM scored higher compared to those using conventional resources (mean difference 6.5 %, 95% CI 2.7-10.2, p<0.001). Significant improvements were seen in management decisions (6.1%, 95% CI 2.5-9.7, p=0.001), diagnostic decisions (12.1%, 95% CI 3.1-21.0, p=0.009), and case-specific (6.2%, 95% CI 2.4-9.9, p=0.002) domains. GPT-4 users spent more time per case (mean difference 119.3 seconds, 95% CI 17.4-221.2, p=0.02). There was no significant difference between GPT-4-augmented physicians and GPT-4 alone (-0.9%, 95% CI -9.0 to 7.2, p=0.8).Conclusions and RelevanceLLM assistance improved physician management reasoning compared to conventional resources, with particular gains in contextual and patient-specific decision-making. These findings indicate that LLMs can augment management decision-making in complex cases.Trial RegistrationClinicalTrials.govIdentifier:NCT06208423;https://classic.clinicaltrials.gov/ct2/show/NCT06208423Key PointsQuestionDoes large language model (LLM) assistance improve physician performance on complex management reasoning tasks compared to conventional resources?FindingsIn this randomized controlled trial of 92 physicians, participants using GPT-4 achieved higher scores on management reasoning compared to those using conventional resources (e.g., UpToDate).MeaningLLM assistance enhances physician management reasoning performance in complex cases with no clear right answers.
Objectives This short communication explores the potential, limitations, and future directions of generative artificial intelligence (GAI) in enhancing diagnostics. Methods This commentary reviews current applications and advancements in GAI, particularly focusing on its integration into medical diagnostics. It examines the role of GAI in supporting medical interviews, assisting in differential diagnosis, and aiding clinical reasoning through the lens of dual-process theory. The discussion is supported by recent examples and theoretical frameworks to illustrate the practical and potential uses of GAI in medicine. Results GAI shows significant promise in enhancing diagnostic processes by supporting the translation of patient descriptions into visual formats, providing differential diagnoses, and facilitating complex clinical reasoning. However, limitations such as the potential for generating medical misinformation, known as hallucinations, exist. Furthermore, the commentary highlights the integration of GAI with both intuitive and analytical decision-making processes in clinical diagnostics, demonstrating potential improvements in both the speed and accuracy of diagnoses. Conclusions While GAI presents transformative potential for medical diagnostics, it also introduces risks that must be carefully managed. Future advancements should focus on refining GAI technologies to better align with human diagnostic reasoning, ensuring GAI enhances rather than replaces the medical professionals’ expertise.
Background Generative artificial intelligence (AI), particularly in the form of large language models, has rapidly developed. The LLaMA series are popular and recently updated from LLaMA2 to LLaMA3. However, the impacts of the update on diagnostic performance have not been well documented. Objective We conducted a comparative evaluation of the diagnostic performance in differential diagnosis lists generated by LLaMA3 and LLaMA2 for case reports. Methods We analyzed case reports published in the American Journal of Case Reports from 2022 to 2023. After excluding nondiagnostic and pediatric cases, we input the remaining cases into LLaMA3 and LLaMA2 using the same prompt and the same adjustable parameters. Diagnostic performance was defined by whether the differential diagnosis lists included the final diagnosis. Multiple physicians independently evaluated whether the final diagnosis was included in the top 10 differentials generated by LLaMA3 and LLaMA2. Results In our comparative evaluation of the diagnostic performance between LLaMA3 and LLaMA2, we analyzed differential diagnosis lists for 392 case reports. The final diagnosis was included in the top 10 differentials generated by LLaMA3 in 79.6% (312/392) of the cases, compared to 49.7% (195/392) for LLaMA2, indicating a statistically significant improvement (P<.001). Additionally, LLaMA3 showed higher performance in including the final diagnosis in the top 5 differentials, observed in 63% (247/392) of cases, compared to LLaMA2’s 38% (149/392, P<.001). Furthermore, the top diagnosis was accurately identified by LLaMA3 in 33.9% (133/392) of cases, significantly higher than the 22.7% (89/392) achieved by LLaMA2 (P<.001). The analysis across various medical specialties revealed variations in diagnostic performance with LLaMA3 consistently outperforming LLaMA2. Conclusions The results reveal that the LLaMA3 model significantly outperforms LLaMA2 per diagnostic performance, with a higher percentage of case reports having the final diagnosis listed within the top 10, top 5, and as the top diagnosis. Overall diagnostic performance improved almost 1.5 times from LLaMA2 to LLaMA3. These findings support the rapid development and continuous refinement of generative AI systems to enhance diagnostic processes in medicine. However, these findings should be carefully interpreted for clinical application, as generative AI, including the LLaMA series, has not been approved for medical applications such as AI-enhanced diagnostics.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.