Responses of Five Different Artificial Intelligence Chatbots to the Top Searched Queries About Erectile Dysfunction: A Comparative Analysis

Şahin, Mehmet Fatih; Ateş, Hüseyin; Keleş, Anıl; Özcan, Rıdvan; Doğan, Çağrı; Akgül, Murat; Yazıcı, Cenk Murat

doi:10.1007/s10916-024-02056-0

Cited by 10 publications

(2 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Temel et al reported that content created by chatGPT regarding spinal cord injury had substantial difficulties with quality 6 . Şahin et al compared 5 different AI chatbots about erectile dysfunction and found that none of the chatbots had the required level of readability and quality25 . Similar to all this research, our study also demonstrated that content about DED created by ChatGPT has significant quality issues.…”

mentioning

confidence: 99%

Evaluation of ChatGPT's Responses to Google Searches About Dry Eye (Preprint)

Sönmezoğlu Sr,

Güner Sönmezoğlu

2024

Preprint

View full text Add to dashboard Cite

BACKGROUND ChatGPT may have the potential to provide detailed information in the field of health and even dry eye OBJECTIVE This study was to evaluate the text's quality, readability, and comprehensibility of the content generated by the ChatGPT about the most frequently searched queries on Google about dry eye disease (DED). METHODS The research employed Google Trends to discover the most commonly searched terms associated with DED. These identified keywords were then entered into ChatGPT, and the generated responses were evaluated for quality using the Ensuring Quality Information for Patients tool (EQIP). The readability of the content was measured using both Flesch-Kincaid Grade Level (FKGL) and Flesch-Kincaid Reading Ease (FKRE) parameters RESULTS The most commonly searched phrases were "eye drops," "dry eyes," and "dry eye drops." The countries that showed the greatest interest in these topics were the United States of America, Ireland, and the United Kingdom. The statistical analysis uncovered substantial concerns regarding the readability and comprehension of ChatGPT's written content about DED, indicating a necessity for enhancement. The low average EQIP value indicated the need to improve the quality and reliability of the content generated by ChatGPT. CONCLUSIONS The results of this indicated that the readability of ChatGPT's content on DED surpassed predefined standards but also highlighted concerns about its quality. Enhancing quality could be achieved by retraining the virtual intelligence with credible sources and verifying information through expert review can enhance the quality of the content.

show abstract

mentioning

confidence: 99%

Evaluation of ChatGPT's Responses to Google Searches About Dry Eye (Preprint)

Sönmezoğlu Sr,

Güner Sönmezoğlu

2024

Preprint

View full text Add to dashboard Cite

show abstract

“…In the ‘Materials and Methods’ section, the authors detail their analysis of these responses using various readability scales, including DISCERN, Ensuring Quality Information for Patients (EQIP), and the Flesch-Kincaid Grade Level (FKGL) and Reading Ease (FKRE). Additionally, they disclose the names of the five AI-based chatbots used to generate the responses under analysis [ 1 ].…”

mentioning

confidence: 99%

Letter to the Editor of the Journal of Medical Systems: Regarding “Responses of Five Different Artificial Intelligence Chatbots to the Top Searched Queries About Erectile Dysfunction: A Comparative Analysis”

Brzeziński,

Olszewski

2024

J Med Syst

View full text Add to dashboard Cite

Using artificial intelligence to generate medical literature for urology patients: a comparison of three different large language models

Pompili,

Richa,

Collins

et al. 2024

World J Urol

View full text Add to dashboard Cite

Purpose Large language models (LLMs) are a form of artificial intelligence (AI) that uses deep learning techniques to understand, summarize and generate content. The potential benefits of LLMs in healthcare is predicted to be immense. The objective of this study was to examine the quality of patient information leaflets (PILs) produced by 3 LLMs on urological topics. Methods Prompts were created to generate PILs from 3 LLMs: ChatGPT-4, PaLM 2 (Google Bard) and Llama 2 (Meta) across four urology topics (circumcision, nephrectomy, overactive bladder syndrome, and transurethral resection of the prostate). PILs were evaluated using a quality assessment checklist. PIL readability was assessed by the Average Reading Level Consensus Calculator. Results PILs generated by PaLM 2 had the highest overall average quality score (3.58), followed by Llama 2 (3.34) and ChatGPT-4 (3.08). PaLM 2 generated PILs were of the highest quality in all topics except TURP and was the only LLM to include images. Medical inaccuracies were present in all generated content including instances of significant error. Readability analysis identified PaLM 2 generated PILs as the simplest (age 14–15 average reading level). Llama 2 PILs were the most difficult (age 16–17 average). Conclusion While LLMs can generate PILs that may help reduce healthcare professional workload, generated content requires clinician input for accuracy and inclusion of health literacy aids, such as images. LLM-generated PILs were above the average reading level for adults, necessitating improvement in LLM algorithms and/or prompt design. How satisfied patients are to LLM-generated PILs remains to be evaluated.

show abstract

Responses of Five Different Artificial Intelligence Chatbots to the Top Searched Queries About Erectile Dysfunction: A Comparative Analysis

Cited by 10 publications

References 25 publications

Evaluation of ChatGPT's Responses to Google Searches About Dry Eye (Preprint)

Evaluation of ChatGPT's Responses to Google Searches About Dry Eye (Preprint)

Letter to the Editor of the Journal of Medical Systems: Regarding “Responses of Five Different Artificial Intelligence Chatbots to the Top Searched Queries About Erectile Dysfunction: A Comparative Analysis”

Using artificial intelligence to generate medical literature for urology patients: a comparison of three different large language models

Contact Info

Product

Resources

About