Accuracy and reproducibility of <scp>ChatGPT</scp>'s free version answers about endometriosis

Ozgor, Bahar Yuksel; Simavi, Melek Azade

doi:10.1002/ijgo.15309

Cited by 12 publications

(4 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As to what specifically regards gynecology, Ozgor and Simavi recently analyzed the accuracy of ChatGPT in answering questions about endometriosis. As many as 91% of questions were answered accurately, although among the questions based on the ESHRE endometriosis guidelines [28], accuracy was lower (67.5%) [29]. Considering the speed at which LLM technologies are expanding, and, consequently, how fast they may improve in preciseness and accuracy, these results are encouraging.…”

Section: Role In the Formulation Of Clinical Diagnosesmentioning

confidence: 83%

Artificial Intelligence in the Management of Women with Endometriosis and Adenomyosis: Can Machines Ever Be Worse Than Humans?

Cetera,

Tozzi,

Chiappa

et al. 2024

JCM

View full text Add to dashboard Cite

Artificial intelligence (AI) is experiencing advances and integration in all medical specializations, and this creates excitement but also concerns. This narrative review aims to critically assess the state of the art of AI in the field of endometriosis and adenomyosis. By enabling automation, AI may speed up some routine tasks, decreasing gynecologists’ risk of burnout, as well as enabling them to spend more time interacting with their patients, increasing their efficiency and patients’ perception of being taken care of. Surgery may also benefit from AI, especially through its integration with robotic surgery systems. This may improve the detection of anatomical structures and enhance surgical outcomes by combining intra-operative findings with pre-operative imaging. Not only that, but AI promises to improve the quality of care by facilitating clinical research. Through the introduction of decision-support tools, it can enhance diagnostic assessment; it can also predict treatment effectiveness and side effects, as well as reproductive prognosis and cancer risk. However, concerns exist regarding the fact that good quality data used in tool development and compliance with data sharing guidelines are crucial. Also, professionals are worried AI may render certain specialists obsolete. This said, AI is more likely to become a well-liked team member rather than a usurper.

show abstract

Section: Role In the Formulation Of Clinical Diagnosesmentioning

confidence: 83%

Artificial Intelligence in the Management of Women with Endometriosis and Adenomyosis: Can Machines Ever Be Worse Than Humans?

Cetera,

Tozzi,

Chiappa

et al. 2024

JCM

View full text Add to dashboard Cite

show abstract

“…Conversely, it requires as well ongoing efforts and resources, besides depending on the quality of feedback. A few studies have investigated reproducibility and repeatability [49,50]. In a study [49] involving emergency physicians, six unique prompts were used in conjunction with 61 patient vignettes to assess ChatGPT's ability to assign Canadian Triage and Acuity Scale (CTAS) scores through 10,980 simulated triages.…”

Section:  the Feedback Loop Paradigmmentioning

confidence: 99%

“…Of note, providing more detailed prompts resulted in slightly greater reproducibility but did not significantly improve accuracy. In another study [50] assessing ChatGPT's proficiency in answering frequently asked questions (FAQs) about endometriosis, detailed internet searches were used to compile questions, which were then aligned with the European Society of Human Reproduction and Embryology (ESHRE) guidelines. An experienced gynecologist rated ChatGPT's responses on a scale of 1-4.…”

Section:  the Feedback Loop Paradigmmentioning

confidence: 99%

Toward Clinical Generative AI: Conceptual Framework

Bragazzi,

Garbarino

2024

JMIR AI

View full text Add to dashboard Cite

Clinical decision-making is a crucial aspect of healthcare, involving the balanced integration of scientific evidence, clinical judgment, ethical considerations, and patient involvement. This process is dynamic and multifaceted, relying on clinicians' knowledge, experience, and intuitive understanding to achieve optimal patient outcomes through informed, evidence-based choices. The advent of generative Artificial Intelligence (AI) presents a revolutionary opportunity in clinical decision-making. AI's advanced data analysis and pattern recognition capabilities can significantly enhance the diagnosis and treatment of diseases, processing vast medical data to identify patterns, tailor treatments, predict disease progression, and aid in proactive patient management. However, the incorporation of AI into clinical decision-making raises concerns regarding the reliability and accuracy of AI-generated insights. To address these concerns, eleven "verification paradigms" are here proposed, with each paradigm offering unique methods to verify the evidence-based nature of AI in clinical decision-making. The paper also frames the concept of "clinically explainable, fair, and responsible, clinician-, expert-, and patient-in-the-loop AI". This model focuses on ensuring AI's comprehensibility, collaborative nature, and ethical grounding, advocating for AI to serve as an augmentative tool, with its decision-making processes being transparent and understandable to clinicians and patients. The integration of AI should enhance, not replace, the clinician's judgment and should involve continuous learning and adaptation based on real-world outcomes and ethical and legal compliance. In conclusion, while generative AI holds immense promise in enhancing clinical decision-making, it is essential to ensure that it produces evidence-based, reliable, and impactful knowledge. Employing the outlined paradigms and approaches can help the medical and patient communities harness AI's potential while maintaining high patient care standards.

show abstract

“…Given the ability of generative AI chat tools to quickly produce detailed, fully articulated summaries drawn from a large body of knowledge, evaluating their current performance in responding to clinical questions is critical to understanding how they may eventually be integrated into medical librarians' workflows. Some studies assessing generative AI tools' ability to provide comprehensive and accurate responses to clinical questions have observed that they can produce accurate results [17][18][19][20], particularly for less complex requests [17], although variation in results has been observed among different specialties, tasks, and models investigated [4]. Significant limitations have also been observed, including introduction of both minor and major errors via hallucination or misinterpretation [17,21,22], lack of up-to-date information [23], and limited domain-specific content knowledge [24].…”

Section: Introductionmentioning

confidence: 99%

Evaluating a Large Language Model’s Ability to Answer Clinicians’ Requests for Evidence Summaries

Blasingame,

Koonce,

Williams

et al. 2024

Preprint

View full text Add to dashboard Cite

Objective: This study investigated the performance of a generative artificial intelligence (AI) tool using GPT-4 in answering clinical questions in comparison with medical librarians' gold-standard evidence syntheses. Methods: Questions were extracted from an in-house database of clinical evidence requests previously answered by medical librarians. Questions with multiple parts were subdivided into individual topics. A standardized prompt was developed using the COSTAR framework. Librarians submitted each question into aiChat, an internally-managed chat tool using GPT-4, and recorded the responses. The summaries generated by aiChat were evaluated on whether they contained the critical elements used in the established gold-standard summary of the librarian. A subset of questions was randomly selected for verification of references provided by aiChat. Results: Of the 216 evaluated questions, aiChat's response was assessed as "correct" for 180 (83.3%) questions, "partially correct" for 35 (16.2%) questions, and "incorrect" for 1 (0.5%) question. No significant differences were observed in question ratings by question category (p=0.39). For a subset of 30% (n=66) of questions, 162 references were provided in the aiChat summaries, and 60 (37%) were confirmed as nonfabricated. Conclusions: Overall, the performance of a generative AI tool was promising. However, many included references could not be independently verified, and attempts were not made to assess whether any additional concepts introduced by aiChat were factually accurate. Thus, we envision this being the first of a series of investigations designed to further our understanding of how current and future versions of generative AI can be used and integrated into medical librarians' workflow.

show abstract

Accuracy and reproducibility of ChatGPT's free version answers about endometriosis

Cited by 12 publications

References 14 publications

Artificial Intelligence in the Management of Women with Endometriosis and Adenomyosis: Can Machines Ever Be Worse Than Humans?

Artificial Intelligence in the Management of Women with Endometriosis and Adenomyosis: Can Machines Ever Be Worse Than Humans?

Toward Clinical Generative AI: Conceptual Framework

Evaluating a Large Language Model’s Ability to Answer Clinicians’ Requests for Evidence Summaries

Contact Info

Product

Resources

About