Large language model (ChatGPT) as a support tool for breast tumor board

Sorin, Vera; Klang, Eyal; Sklair‐Levy, Miri; Cohen, Ilan; Zippel, Douglas; Lahat, Nora Balint; Konen, Eli; Barash, Yiftach

doi:10.1038/s41523-023-00557-8

Cited by 102 publications

(57 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Previous reports have shown good performance for well-defined tasks in radiology, dermatology, or pathology . More recently developed LLMs might also help to deal with more complex tasks in medicine, such as clinical decision-support in organ-specific tumor boards to facilitate the implementation of existing guidelines . Integrating multidimensional data beyond established guidelines is an additional challenge typically faced in precision oncology and molecular tumor boards, making this a compelling use case for LLMs.…”

Section: Discussionmentioning

confidence: 99%

Leveraging Large Language Models for Decision Support in Personalized Oncology

Benary,

Wang,

Schmidt

et al. 2023

JAMA Netw Open

View full text Add to dashboard Cite

ImportanceClinical interpretation of complex biomarkers for precision oncology currently requires manual investigations of previous studies and databases. Conversational large language models (LLMs) might be beneficial as automated tools for assisting clinical decision-making.ObjectiveTo assess performance and define their role using 4 recent LLMs as support tools for precision oncology.Design, Setting, and ParticipantsThis diagnostic study examined 10 fictional cases of patients with advanced cancer with genetic alterations. Each case was submitted to 4 different LLMs (ChatGPT, Galactica, Perplexity, and BioMedLM) and 1 expert physician to identify personalized treatment options in 2023. Treatment options were masked and presented to a molecular tumor board (MTB), whose members rated the likelihood of a treatment option coming from an LLM on a scale from 0 to 10 (0, extremely unlikely; 10, extremely likely) and decided whether the treatment option was clinically useful.Main Outcomes and MeasuresNumber of treatment options, precision, recall, F1 score of LLMs compared with human experts, recognizability, and usefulness of recommendations.ResultsFor 10 fictional cancer patients (4 with lung cancer, 6 with other; median [IQR] 3.5 [3.0-4.8] molecular alterations per patient), a median (IQR) number of 4.0 (4.0-4.0) compared with 3.0 (3.0-5.0), 7.5 (4.3-9.8), 11.5 (7.8-13.0), and 13.0 (11.3-21.5) treatment options each was identified by the human expert and 4 LLMs, respectively. When considering the expert as a criterion standard, LLM-proposed treatment options reached F1 scores of 0.04, 0.17, 0.14, and 0.19 across all patients combined. Combining treatment options from different LLMs allowed a precision of 0.29 and a recall of 0.29 for an F1 score of 0.29. LLM-generated treatment options were recognized as AI-generated with a median (IQR) 7.5 (5.3-9.0) points in contrast to 2.0 (1.0-3.0) points for manually annotated cases. A crucial reason for identifying AI-generated treatment options was insufficient accompanying evidence. For each patient, at least 1 LLM generated a treatment option that was considered helpful by MTB members. Two unique useful treatment options (including 1 unique treatment strategy) were identified only by LLM.Conclusions and RelevanceIn this diagnostic study, treatment options of LLMs in precision oncology did not reach the quality and credibility of human experts; however, they generated helpful ideas that might have complemented established procedures. Considering technological progress, LLMs could play an increasingly important role in assisting with screening and selecting relevant biomedical literature to support evidence-based, personalized treatment decisions.

show abstract

Section: Discussionmentioning

confidence: 99%

Leveraging Large Language Models for Decision Support in Personalized Oncology

Benary,

Wang,

Schmidt

et al. 2023

JAMA Netw Open

View full text Add to dashboard Cite

show abstract

“…In [39] it was found that ChatGPT correctly answered 74% of the trivia questions related to heart diseases. Specifically, the accuracy of ChatGPT scored impressively in the domains of coronary artery disease (80%), pulmonary and venous thrombotic embolism (80%), atrial fibrillation (70%), heart failure (80%) and cardiovascular risk management (60%) [40] evaluated ChatGPT as a support tool for breast tumor board decision making [41] assessed ChatGPT's capacity for clinical decision support in paediatrics [7] evaluated the capacity of ChatGPT as a clinical decision support in triaging patients for appropriate imaging services [42] did a comparative analysis of humans and LLMs in decision making abilities. The analysis found that there was a moderate level of agreement between the decisions of humans and LLMs.…”

Section: Llm As a Decision Support Toolmentioning

confidence: 99%

Exploring the role of large language models in radiation emergency response

Chandra,

Chakraborty

2024

J. Radiol. Prot.

View full text Add to dashboard Cite

In recent times, the field of artificial intelligence (AI) has been transformed by the introduction of large language models (LLMs). These models, popularized by OpenAI's GPT-3, have demonstrated the emergent capabilities of AI in comprehending and producing text resembling human language, which has helped them transform several industries. But its role has yet to be explored in the nuclear industry, specifically in managing radiation emergencies. The present work explores LLMs' contextual awareness, natural language interaction, and their capacity to comprehend diverse queries in a radiation emergency response setting. In this study we identify different user types and their specific LLM use-cases in radiation emergencies. Their possible interactions with ChatGPT, a popular LLM, has also been simulated and preliminary results are presented. Drawing on the insights gained from this exercise and to address concerns of reliability and misinformation, this study advocates for expert guided and domain-specific LLMs trained on radiation safety protocols and historical data. This study aims to guide radiation emergency management practitioners and decision-makers in effectively incorporating LLMs into their decision support framework.

show abstract

“…In seven out of ten cases (70%), ChatGPT's recommendations were similar to the tumor board's decisions. Initial results on the use of an LLM as a decision support tool in a breast tumor board give positive insight (14).…”

Section: Kung Et Al Evaluated the Performance Of Chatgpt On United St...mentioning

confidence: 99%

Enhancing interpretation assistance by real-time query resolution in BICR of oncology clinical trials by leveraging ChatGPT bots

Sharma,

Farough,

Burkett

et al. 2024

Medical Imaging 2024: Imaging Informatics for Healthcare, Research, and Applications

View full text Add to dashboard Cite

Purpose: Blinded Independent Central Review (BICR) is pivotal in maintaining unbiased assessment in oncology clinical trials employing various assessment criteria like Response Evaluation Criteria In Solid Tumors (RECIST) in clinical trials framework. This paper emphasizes the potential of Large Language Models (LLMs) such as OpenAI's GPT-4 and ChatGPT trained on clinical trials' documents and other reader training materials, to significantly improve interpretation assistance and real-time query resolution. During central review process these documents are easily accessible to readers but sometimes, given that most readers read on multiple ongoing clinical trials on regular basis, it is a daunting task to search details like trial design, endpoints and specific reader rules. Through various pre-trained frameworks using ChatGPT Application Programming Interface (API), an AI-based chatbot can be used for helping readers saving time by providing accurate study design related questions based on uploaded training documents. If successful, this analysis can open another novel implementation for LLMs (or ChatGPT) in clinical research and medical imaging. Methods: This prospective study involved the review of study design and protocol available from clinicaltrials.gov database maintained by The National Library of Medicine (NLM) at the National Institutes of Health (NIH). ClinicalTrials.gov is a registry of clinical trials that contains information on clinical studies funded by the NIH, other federal agencies, and private industry. The database includes over 444,000 trials from 221 countries. The NLM works with the US Food and Drug Administration (FDA) to develop and maintain the database. The ChatGPT based Chatbot was trained on clinical trial design data from respective studies to grasp the intricacies of assessment criteria, patient population, inclusion / exclusion criteria and other assessment nuances, as applicable. Fine-tuning with prompt engineering ensured Chatbot to understand the language and context specific to BICR. The resulting models serve as intelligent assistants to provide a user-friendly interface for reviewers. Reviewers can engage with the chatbot in natural language, obtaining clarifications on assessment guidelines, terminology, and complex cases. To train the Chatbot, we searched for Lung Cancer studies with following specifications: "Completed Studies | Studies With Results | Interventional Studies | Lung Cancer | Phase 3 | Study Protocols | Statistical Analysis Plans (SAPs)" from the clinicaltrials.gov website. Without any bias, we selected the first 3 studies in search results for our prospective study and 7 questions to ask, representative of commonly encountered queries for readers. Results: The algorithm supported by ChatGPT was evaluated against a Gold Standard medical opinion, determined by a board-certified radiologist with over 20 years of experience in the BICR process. The Chatbot provided immediate, contextually accurate insights for all the questions across 3 studies. The trick ques...

show abstract

Large language model (ChatGPT) as a support tool for breast tumor board

Cited by 102 publications

References 11 publications

Leveraging Large Language Models for Decision Support in Personalized Oncology

Leveraging Large Language Models for Decision Support in Personalized Oncology

Exploring the role of large language models in radiation emergency response

Enhancing interpretation assistance by real-time query resolution in BICR of oncology clinical trials by leveraging ChatGPT bots

Contact Info

Product

Resources

About