Validation of the QAMAI tool to assess the quality of health information provided by AI

Vaira, Luigi Angelo; Lechien, Jerome R.; Abbate, Vincenzo; Allevi, Fabiana; Audino, Giovanni; Beltramini, Giada Anna; Bergonzani, Michela; Boscolo-Rizzo, Paolo; Califano, Gianluigi; Cammaroto, Giovanni; Chiesa-Estomba, Carlos M.; Committeri, Umberto; Crimi, Salvatore; Curran, Nicholas R.; di Bello, Francesco; di Stadio, Arianna; Frosolini, Andrea; Gabriele, Guido; Gengler, Isabelle M.; Lonardi, Fabio; Maniaci, Antonino; Maglitto, Fabio; Mayo-Yáñez, Miguel; Petrocelli, Marzia; Pucci, Resi; Saibene, Alberto Maria; Saponaro, Gianmarco; Tel, Alessandro; Trabalzini, Franco; Trecca, Eleonora M.C.; Vellone, Valentino; Salzano, Giovanni; De Riu, Giacomo

doi:10.1101/2024.01.25.24301774

Cited by 2 publications

(2 citation statements)

References 40 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recent studies have significantly contributed to understanding the potential and limitations of AI in otolaryngology, emphasizing the need for rigorous validation of AI tools before their integration into clinical practice. For instance, the development and validation of the QAMAI tool demonstrate a systematic approach to evaluate AI-generated health information, showing robust construct validity and high internal consistency which could be instrumental in ensuring the reliability of AI platforms, including ChatGPT, within otolaryngology settings ( 23 ). Furthermore, the complexity of using AI for synthesizing clinical guidelines is highlighted by the variability in AI responses compared to expert consensus, underscoring the necessity for AI to be used with caution, particularly in complex medical fields like otolaryngology ( 24 ).…”

Section: Discussionmentioning

confidence: 99%

Evaluating ChatGPT-4’s performance as a digital health advisor for otosclerosis surgery

Sahin,

Erkmen,

Duymaz

et al. 2024

Front. Surg.

View full text Add to dashboard Cite

PurposeThis study aims to evaluate the effectiveness of ChatGPT-4, an artificial intelligence (AI) chatbot, in providing accurate and comprehensible information to patients regarding otosclerosis surgery.MethodsOn October 20, 2023, 15 hypothetical questions were posed to ChatGPT-4 to simulate physician-patient interactions about otosclerosis surgery. Responses were evaluated by three independent ENT specialists using the DISCERN scoring system. The readability was evaluated using multiple indices: Flesch Reading Ease (FRE), Flesch-Kincaid Grade Level (FKGL), Gunning Fog Index (Gunning FOG), Simple Measure of Gobbledygook (SMOG), Coleman-Liau Index (CLI), and Automated Readability Index (ARI).ResultsThe responses from ChatGPT-4 received DISCERN scores ranging from poor to excellent, with an overall score of 50.7 ± 8.2. The readability analysis indicated that the texts were above the 6th-grade level, suggesting they may not be easily comprehensible to the average reader. There was a significant positive correlation between the referees’ scores. Despite providing correct information in over 90% of the cases, the study highlights concerns regarding the potential for incomplete or misleading answers and the high readability level of the responses.ConclusionWhile ChatGPT-4 shows potential in delivering health information accurately, its utility is limited by the level of readability of its responses. The study underscores the need for continuous improvement in AI systems to ensure the delivery of information that is both accurate and accessible to patients with varying levels of health literacy. Healthcare professionals should supervise the use of such technologies to enhance patient education and care.

show abstract

Section: Discussionmentioning

confidence: 99%

Evaluating ChatGPT-4’s performance as a digital health advisor for otosclerosis surgery

Sahin,

Erkmen,

Duymaz

et al. 2024

Front. Surg.

View full text Add to dashboard Cite

show abstract

“…The QAMAI tool stems from a methodology that has been well-validated and extensively applied for evaluating the quality of health information across various platforms, including websites [9], social networks [10], and other multimedia channels [11]. Through a panel of experts' guidelines, this tool underwent validation for its construct validity, internal consistency, and reliability using 40 LLM responses on colorectal surgery [12]. Opting for a qualitative analysis enabled the capture of subtleties and nuances in user experience and perception that frequently preclude quantitative measures.…”

Section: Methodsmentioning

confidence: 99%

Can AI answer my questions? Using Artificial Intelligence to help provide information for patients with a stoma

Lim,

Lirios,

Sakalkale

et al. 2024

Preprint

View full text Add to dashboard Cite

Background Stomas present significant lifestyle and psychological challenges for patients, requiring comprehensive education and support. Current educational methods have limitations in offering relevant information to the patient, highlighting a potential role for Artificial Intelligence (AI). This study examined the utility of AI in enhancing stoma therapy management following colorectal surgery. Material and Methods We compared the efficacy of four prominent Large Language Models (LLM)—OpenAI's ChatGPT-3.5 and ChatGPT-4.0, Google's Gemini, and Bing's CoPilot—against a series of metrics to evaluate their suitability as supplementary clinical tools. Through qualitative and quantitative analyses, including readability scores (Flesch-Kincaid, Flesch-Reading Ease, and Coleman-Liau index) and reliability assessments (Likert scale, DISCERN score and QAMAI tool), the study aimed to assess the appropriateness of LLM-generated advice for patients managing stomas. Results There are varying degrees of readability and reliability across the evaluated models, with CoPilot and ChatGPT-4 demonstrating superior performance in several key metrics such as readability and comprehensiveness. However, the study underscores the infant stage of LLM technology in clinical applications. All responses required high school to college level education to comprehend comfortably. While the LLMs addressed users’ questions directly, the absence of incorporating patient-specific factors such as past medical history generated broad and generic responses rather than offering tailored advice. Conclusion The complexity of individual patient conditions can challenge AI systems. The use of LLMs in clinical settings holds promise for improving patient education and stoma management support, but requires careful consideration of the models' capabilities and the context of their use.

show abstract

Validation of the QAMAI tool to assess the quality of health information provided by AI

Cited by 2 publications

References 40 publications

Evaluating ChatGPT-4’s performance as a digital health advisor for otosclerosis surgery

Evaluating ChatGPT-4’s performance as a digital health advisor for otosclerosis surgery

Can AI answer my questions? Using Artificial Intelligence to help provide information for patients with a stoma

Contact Info

Product

Resources

About