2023
DOI: 10.1007/s00296-023-05473-5
|View full text |Cite
|
Sign up to set email alerts
|

Assessing the accuracy and completeness of artificial intelligence language models in providing information on methotrexate use

Belkis Nihan Coskun,
Burcu Yagiz,
Gokhan Ocakoglu
et al.
Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
5
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 20 publications
(6 citation statements)
references
References 26 publications
1
5
0
Order By: Relevance
“…This study found that ChatGPT provided more detailed and accurate responses to patient questions about ROP, with 98% of answers falling into the “agreed” or “strongly agreed” category compared to BingAI and Gemini. A similar result was found by Coskun et al [ 16 ] in questions about methotrexate use, as ChatGPT achieved a 100% correct answer rate, while Bard (currently known as Gemini) and BingAI scored 73.91%. In another study assessing the quality and readability of AI chatbot-generated answers to frequently asked clinical inquiries in the field of bariatric and metabolic surgery, a significant difference was observed in the proportion of appropriate answers among the three LLMs: ChatGPT-4 led with 85.7%, followed by Bard at 74.3%, and BingAI at 25.7% [ 26 ].…”
Section: Discussionsupporting
confidence: 86%
See 1 more Smart Citation
“…This study found that ChatGPT provided more detailed and accurate responses to patient questions about ROP, with 98% of answers falling into the “agreed” or “strongly agreed” category compared to BingAI and Gemini. A similar result was found by Coskun et al [ 16 ] in questions about methotrexate use, as ChatGPT achieved a 100% correct answer rate, while Bard (currently known as Gemini) and BingAI scored 73.91%. In another study assessing the quality and readability of AI chatbot-generated answers to frequently asked clinical inquiries in the field of bariatric and metabolic surgery, a significant difference was observed in the proportion of appropriate answers among the three LLMs: ChatGPT-4 led with 85.7%, followed by Bard at 74.3%, and BingAI at 25.7% [ 26 ].…”
Section: Discussionsupporting
confidence: 86%
“…Each of these models—ChatGPT-4 with its broad conversational capabilities, BingAI with its research-centric prowess, and Gemini with its real-time information synthesis—reflects the strategic priorities of their respective developers and offers distinct advantages depending on the application. Therefore, each may behave differently in response to patient inquiries about medical conditions [ 16 , 17 ]. Similar studies in the ophthalmology literature also report varying results regarding the success of these LLMs in providing professional medical information or responding to patient inquiries [ 8 , 9 , 10 , 11 , 12 ].…”
Section: Introductionmentioning
confidence: 99%
“…We also reviewed a study assessing the accuracy and completeness of several LLMs when answering Methotrexate-related questions. 23 This study was excluded because it focused solely on the pharmacological treatment of rheumatic disease. For a detailed breakdown of the inclusion and exclusion process at each stage, please refer to the PRISMA flowchart in Figure 1.…”
Section: Screening Resultsmentioning
confidence: 99%
“…The investigations have yielded various results about the superiority and performance of the three AI-based chatbots. In a study on methotrexate use, ChatGPT-3.5, Bard, and Bing had correct answer rates of 100%, 73.91%, and 73.91%, respectively [33]. A comparative study in endodontics also found that ChatGPT-3.5 offered more reliable information than Bard and Bing [34].…”
Section: Discussionmentioning
confidence: 99%