Large language models as decision aids in neuro-oncology: a review of shared decision-making applications

Lawson McLean, Aaron; Wu, Yonghui; Lawson McLean, Anna C.; Hristidis, Vagelis

doi:10.1007/s00432-024-05673-x

J Cancer Res Clin Oncol

2024

DOI: 10.1007/s00432-024-05673-x

|View full text |Cite

Large language models as decision aids in neuro-oncology: a review of shared decision-making applications

Aaron Lawson McLean,

Yonghui Wu,

Anna C. Lawson McLean

et al.

Abstract: Shared decision-making (SDM) is crucial in neuro-oncology, fostering collaborations between patients and healthcare professionals to navigate treatment options. However, the complexity of neuro-oncological conditions and the cognitive and emotional burdens on patients present significant barriers to achieving effective SDM. This discussion explores the potential of large language models (LLMs) such as OpenAI's ChatGPT and Google's Bard to overcome these barriers, offering a means to enhance patient understandi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2024

Publication Types

Select...

Article3

Relationship

Self Cite0

Independent3

Authors

Journals

Cited by 3 publications

References 64 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

Accuracy and consistency of publicly available Large Language Models as clinical decision support tools for the management of colon cancer

Kaiser,

Hughes,

Yang

et al. 2024

Journal of Surgical Oncology

View full text Add to dashboard Cite

BackgroundLarge Language Models (LLM; e.g., ChatGPT) may be used to assist clinicians and form the basis of future clinical decision support (CDS) for colon cancer. The objectives of this study were to (1) evaluate the response accuracy of two LLM‐powered interfaces in identifying guideline‐based care in simulated clinical scenarios and (2) define response variation between and within LLMs.MethodsClinical scenarios with “next steps in management” queries were developed based on National Comprehensive Cancer Network guidelines. Prompts were entered into OpenAI ChatGPT and Microsoft Copilot in independent sessions, yielding four responses per scenario. Responses were compared to clinician‐developed responses and assessed for accuracy, consistency, and verbosity.ResultsAcross 108 responses to 27 prompts, both platforms yielded completely correct responses to 36% of scenarios (n = 39). For ChatGPT, 39% (n = 21) were missing information and 24% (n = 14) contained inaccurate/misleading information. Copilot performed similarly, with 37% (n = 20) having missing information and 28% (n = 15) containing inaccurate/misleading information (p = 0.96). Clinician responses were significantly shorter (34 ± 15.5 words) than both ChatGPT (251 ± 86 words) and Copilot (271 ± 67 words; both p < 0.01).ConclusionsPublicly available LLM applications often provide verbose responses with vague or inaccurate information regarding colon cancer management. Significant optimization is required before use in formal CDS.

show abstract

Accuracy and consistency of publicly available Large Language Models as clinical decision support tools for the management of colon cancer

Kaiser,

Hughes,

Yang

et al. 2024

Journal of Surgical Oncology

View full text Add to dashboard Cite

show abstract

Fine-Tuning LLMs for Specialized Use Cases

Anisuzzaman,

Malins,

Friedman

et al. 2024

Mayo Clinic Proceedings: Digital Health

View full text Add to dashboard Cite

Autonomous medical evaluation for guideline adherence of large language models

Fast,

Adams,

Busch

et al. 2024

npj Digit. Med.

View full text Add to dashboard Cite

Autonomous Medical Evaluation for Guideline Adherence (AMEGA) is a comprehensive benchmark designed to evaluate large language models’ adherence to medical guidelines across 20 diagnostic scenarios spanning 13 specialties. It includes an evaluation framework and methodology to assess models’ capabilities in medical reasoning, differential diagnosis, treatment planning, and guideline adherence, using open-ended questions that mirror real-world clinical interactions. It includes 135 questions and 1337 weighted scoring elements designed to assess comprehensive medical knowledge. In tests of 17 LLMs, GPT-4 scored highest with 41.9/50, followed closely by Llama-3 70B and WizardLM-2-8x22B. For comparison, a recent medical graduate scored 25.8/50. The benchmark introduces novel content to avoid the issue of LLMs memorizing existing medical data. AMEGA’s publicly available code supports further research in AI-assisted clinical decision-making, aiming to enhance patient care by aiding clinicians in diagnosis and treatment under time constraints.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Large language models as decision aids in neuro-oncology: a review of shared decision-making applications

Cited by 3 publications

References 64 publications

Accuracy and consistency of publicly available Large Language Models as clinical decision support tools for the management of colon cancer

Accuracy and consistency of publicly available Large Language Models as clinical decision support tools for the management of colon cancer

Fine-Tuning LLMs for Specialized Use Cases

Autonomous medical evaluation for guideline adherence of large language models

Contact Info

Product

Resources

About