2023
DOI: 10.1002/ijgo.15309
|View full text |Cite
|
Sign up to set email alerts
|

Accuracy and reproducibility of ChatGPT's free version answers about endometriosis

Bahar Yuksel Ozgor,
Melek Azade Simavi

Abstract: ObjectiveTo evaluate the accuracy and reproducibility of ChatGPT's free version answers about endometriosis for the first time.MethodsDetailed internet searches to identify frequently asked questions (FAQs) about endometriosis have been performed. Scientific questions were prepared in accordance with the European Society of Human Reproduction and Embryology (ESHRE) endometriosis guidelines. An experienced gynecologist gave a score of 1–4 for each ChatGPT answer. The repeatability of ChatGPT answers about endom… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 12 publications
(4 citation statements)
references
References 14 publications
0
4
0
Order By: Relevance
“…As to what specifically regards gynecology, Ozgor and Simavi recently analyzed the accuracy of ChatGPT in answering questions about endometriosis. As many as 91% of questions were answered accurately, although among the questions based on the ESHRE endometriosis guidelines [28], accuracy was lower (67.5%) [29]. Considering the speed at which LLM technologies are expanding, and, consequently, how fast they may improve in preciseness and accuracy, these results are encouraging.…”
Section: Role In the Formulation Of Clinical Diagnosesmentioning
confidence: 83%
“…As to what specifically regards gynecology, Ozgor and Simavi recently analyzed the accuracy of ChatGPT in answering questions about endometriosis. As many as 91% of questions were answered accurately, although among the questions based on the ESHRE endometriosis guidelines [28], accuracy was lower (67.5%) [29]. Considering the speed at which LLM technologies are expanding, and, consequently, how fast they may improve in preciseness and accuracy, these results are encouraging.…”
Section: Role In the Formulation Of Clinical Diagnosesmentioning
confidence: 83%
“…Conversely, it requires as well ongoing efforts and resources, besides depending on the quality of feedback. A few studies have investigated reproducibility and repeatability [49,50]. In a study [49] involving emergency physicians, six unique prompts were used in conjunction with 61 patient vignettes to assess ChatGPT's ability to assign Canadian Triage and Acuity Scale (CTAS) scores through 10,980 simulated triages.…”
Section:  the Feedback Loop Paradigmmentioning
confidence: 99%
“…Of note, providing more detailed prompts resulted in slightly greater reproducibility but did not significantly improve accuracy. In another study [50] assessing ChatGPT's proficiency in answering frequently asked questions (FAQs) about endometriosis, detailed internet searches were used to compile questions, which were then aligned with the European Society of Human Reproduction and Embryology (ESHRE) guidelines. An experienced gynecologist rated ChatGPT's responses on a scale of 1-4.…”
Section:  the Feedback Loop Paradigmmentioning
confidence: 99%
“…Given the ability of generative AI chat tools to quickly produce detailed, fully articulated summaries drawn from a large body of knowledge, evaluating their current performance in responding to clinical questions is critical to understanding how they may eventually be integrated into medical librarians' workflows. Some studies assessing generative AI tools' ability to provide comprehensive and accurate responses to clinical questions have observed that they can produce accurate results [17][18][19][20], particularly for less complex requests [17], although variation in results has been observed among different specialties, tasks, and models investigated [4]. Significant limitations have also been observed, including introduction of both minor and major errors via hallucination or misinterpretation [17,21,22], lack of up-to-date information [23], and limited domain-specific content knowledge [24].…”
Section: Introductionmentioning
confidence: 99%