2023
DOI: 10.1101/2023.10.31.23297825
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Comparison of Large Language Models in Answering Immuno-Oncology Questions: A Cross-Sectional Study

Giovanni Maria Iannantuono,
Dara Bracken-Clarke,
Fatima Karzai
et al.

Abstract: BackgroundThe capability of large language models (LLMs) to understand and generate human-readable text has prompted the investigation of their potential as educational and management tools for cancer patients and healthcare providers.Materials and MethodsWe conducted a cross-sectional study aimed at evaluating the ability of ChatGPT-4, ChatGPT-3.5, and Google Bard to answer questions related to four domains of immuno-oncology (Mechanisms, Indications, Toxicities, and Prognosis). We generated 60 open-ended que… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

1
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(2 citation statements)
references
References 28 publications
1
1
0
Order By: Relevance
“…GPT-4 achieved the highest overall score, followed by Bard and GPT-3.5. This aligns with previous findings where GPT-4 outperformed GPT-3.5 and Bard in terms of overall correct response rates [ 9 , 11 , 15 ]. Because detailed scoring criteria were not announced for all but the essential questions, we were unable to assess whether the LLMs met the JNDE's passing criteria.…”
Section: Discussionsupporting
confidence: 92%
See 1 more Smart Citation
“…GPT-4 achieved the highest overall score, followed by Bard and GPT-3.5. This aligns with previous findings where GPT-4 outperformed GPT-3.5 and Bard in terms of overall correct response rates [ 9 , 11 , 15 ]. Because detailed scoring criteria were not announced for all but the essential questions, we were unable to assess whether the LLMs met the JNDE's passing criteria.…”
Section: Discussionsupporting
confidence: 92%
“…In English-speaking countries, GPT-4 has been reported to meet the passing criteria for both the United States Medical Licensing Examination and the United Kingdom Medical Licensing Assessment [ 6 - 8 ]. Comparative studies between GPT and Bard have demonstrated GPT-4's superiority in answering several professional questions [ 9 - 11 ].…”
Section: Introductionmentioning
confidence: 99%