Augmented non-hallucinating large language models as medical information curators

Gilbert, Stephen; Kather, Jakob Nikolas; Hogan, Aidan

doi:10.1038/s41746-024-01081-0

Cited by 10 publications

(1 citation statement)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Large language models (LLMs) have shown potential to revolutionize medical information retrieval and clinical decision support. [1][2][3][4] These advanced artificial intelligence (AI) systems leverage vast amounts of data and computational power to generate human-like responses. 5 Previous investigations have demonstrated the accuracy and reliability of LLMs in answering clinical questions across various medical fields with promising results.…”

Section: Introductionmentioning

confidence: 99%

Multimodal Large Language Model Passes Specialty Board Examination and Surpasses Human Test-Taker Scores: A Comparative Analysis Examining the Stepwise Impact of Model Prompting Strategies on Performance

Samaan,

Margolis,

Srinivasan

et al. 2024

Preprint

View full text Add to dashboard Cite

Background: Large language models (LLMs) have shown promise in answering medical licensing examination-style questions. However, there is limited research on the performance of multimodal LLMs on subspecialty medical examinations. Our study benchmarks the performance of multimodal LLMs enhanced by model prompting strategies on gastroenterology subspecialty examination-style questions and examines how these prompting strategies incrementally improve overall performance. Methods: We used the 2022 American College of Gastroenterology (ACG) self-assessment examination (N=300). This test is typically completed by gastroenterology fellows and established gastroenterologists preparing for the gastroenterology subspecialty board examination. We employed a sequential implementation of model prompting strategies: prompt engineering, Retrieval-Augmented Generation (RAG), five-shot learning, and an LLM-powered answer validation revision model (AVRM). GPT-4 and Gemini Pro were tested. Results: Implementing all prompting strategies improved the overall score of GPT-4 from 60.3% to 80.7% and Gemini Pro from 48.0% to 54.3%. GPT-4's score surpassed the 70% passing threshold and 75% average human test-taker scores unlike Gemini Pro. Stratification of questions by difficulty showed the accuracy of both LLMs mirrored that of human examinees, demonstrating higher accuracy as human test-taker accuracy increased. The addition of the AVRM to prompt, RAG, and 5-shot increased GPT-4's accuracy by 4.4%. The incremental addition of model prompting strategies improved accuracy for both non-image (57.2% to 80.4%) and image-based (63.0% to 80.9%) questions for GPT-4, but not Gemini Pro. Conclusions: Our results underscore the value of model prompting strategies in improving LLM performance on subspecialty-level licensing exam questions. We also present a novel implementation of an LLM-powered reviewer model in the context of subspecialty medicine which further improved model performance when combined with other prompting strategies. Our findings highlight the potential future role of multimodal LLMs, particularly with the implementation of multiple model prompting strategies, as clinical decision support systems in subspecialty care for healthcare providers. Keywords: ChatGPT, Gemini pro, gastroenterology, RAG, prompt engineering, medical specialty examination.

show abstract

Section: Introductionmentioning

confidence: 99%

Multimodal Large Language Model Passes Specialty Board Examination and Surpasses Human Test-Taker Scores: A Comparative Analysis Examining the Stepwise Impact of Model Prompting Strategies on Performance

Samaan,

Margolis,

Srinivasan

et al. 2024

Preprint

View full text Add to dashboard Cite

show abstract

KI zur Verbesserung des Informationsflusses zwischen verschiedenen Akteuren der Gesundheitsbranche und Pharmaindustrie

Isbary,

Zimmer,

Dettmar

et al. 2024

Forum

View full text Add to dashboard Cite

Artificial Intelligence in Medical Affairs: A New Paradigm with Novel Opportunities

Fröling,

Rajaeean,

Hinrichsmeyer

et al. 2024

Pharm Med

View full text Add to dashboard Cite

The advent of artificial intelligence (AI) revolutionizes the ways of working in many areas of business and life science. In Medical Affairs (MA) departments of the pharmaceutical industry AI holds great potential for positively influencing the medical mission of identifying and addressing unmet medical needs and care gaps, and fostering solutions that improve the egalitarian and unbiased access of patients to treatments worldwide. Given the essential position of MA in corporate interactions with various healthcare stakeholders, AI offers broad possibilities to support strategic decision-making and to pioneer novel approaches in medical stakeholder interactions. By analyzing data derived from the healthcare environment and by streamlining operations in medical content generation, AI advances data-based prioritization and strategy execution. In this review, we discuss promising AI-based solutions in MA that support the effective use of heterogenous information from observations of the healthcare environment, the enhancement of medical education, and the analysis of real-world data. For a successful implementation of such solutions, specific considerations partly unique to healthcare must be taken care of, for example, transparency, data privacy, healthcare regulations, and in predictive applications, explainability.

show abstract

Augmented non-hallucinating large language models as medical information curators

Cited by 10 publications

References 27 publications

Multimodal Large Language Model Passes Specialty Board Examination and Surpasses Human Test-Taker Scores: A Comparative Analysis Examining the Stepwise Impact of Model Prompting Strategies on Performance

Multimodal Large Language Model Passes Specialty Board Examination and Surpasses Human Test-Taker Scores: A Comparative Analysis Examining the Stepwise Impact of Model Prompting Strategies on Performance

KI zur Verbesserung des Informationsflusses zwischen verschiedenen Akteuren der Gesundheitsbranche und Pharmaindustrie

Artificial Intelligence in Medical Affairs: A New Paradigm with Novel Opportunities

Contact Info

Product

Resources

About