Comparison of the audiological knowledge of three chatbots – ChatGPT, Bing Chat, and Bard

Jedrzejczak, W. Wiktor; Kochanek, Krzysztof

doi:10.1101/2023.11.22.23298893

Cited by 5 publications

(6 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…That means one is unable to tell which part of the response they are referencing. In contrast, one other chatbot-Microsoft Copilot (formerly Bing Chat)-provides links to real sources and provides references in the text [42].…”

Section: Discussionmentioning

confidence: 99%

ChatGPT for Tinnitus Information and Support: Response Accuracy and Retest after Three and Six Months

Jedrzejczak,

Skarzynski,

Raj-Koziak

et al. 2024

Brain Sciences

Self Cite

View full text Add to dashboard Cite

Testing of ChatGPT has recently been performed over a diverse range of topics. However, most of these assessments have been based on broad domains of knowledge. Here, we test ChatGPT’s knowledge of tinnitus, an important but specialized aspect of audiology and otolaryngology. Testing involved evaluating ChatGPT’s answers to a defined set of 10 questions on tinnitus. Furthermore, given the technology is advancing quickly, we re-evaluated the responses to the same 10 questions 3 and 6 months later. The accuracy of the responses was rated by 6 experts (the authors) using a Likert scale ranging from 1 to 5. Most of ChatGPT’s responses were rated as satisfactory or better. However, we did detect a few instances where the responses were not accurate and might be considered somewhat misleading. Over the first 3 months, the ratings generally improved, but there was no more significant improvement at 6 months. In our judgment, ChatGPT provided unexpectedly good responses, given that the questions were quite specific. Although no potentially harmful errors were identified, some mistakes could be seen as somewhat misleading. ChatGPT shows great potential if further developed by experts in specific areas, but for now, it is not yet ready for serious application.

show abstract

Section: Discussionmentioning

confidence: 99%

ChatGPT for Tinnitus Information and Support: Response Accuracy and Retest after Three and Six Months

Jedrzejczak,

Skarzynski,

Raj-Koziak

et al. 2024

Brain Sciences

Self Cite

View full text Add to dashboard Cite

show abstract

“…ChatGPT is a large language model developed by OpenAI and integrates both GPT-3.5 [16] and GPT-4.0 models [15] that comprehend and generate human-like responses through text. Bing Chat is powered by GPT-4.0 and incorporated into the Microsoft Edge browser, which has another capability to generate images and innovative content [9]. Bard AI, a robust Large Language Model (LLM) developed by Google based on Pathways Language Model 2 (PaLM2) and trained on an expansive collection of text and code that exhibits creative content design.…”

Section: Search Strategy and Criteriamentioning

confidence: 99%

“…In this context, AI-powered platforms are emerging as potential aides for literature reviews [7]. Contemporary innovations have introduced platforms such as ChatGPT [8], Bing Chat [9] and Bard AI [10]. These tools are not just digital cataloging systems but smart engines that claim to understand and retrieve precise information.…”

Section: Introductionmentioning

confidence: 99%

“…This AI framework not only highlights its advanced acumen in information retrieval but also adeptly addresses syntactical inaccuracies, thereby serving as a valuable resource for literature evaluations and the composition of scholarly manuscripts [8]. In a parallel vein, Bing Chat, a product of Microsoft, emerges as an AIdriven conversational agent capable of engendering inventive and novel content, spanning the spectrum from poetic compositions and narratives to code snippets, essays, musical compositions, satirical renditions of celebrities, and visual representations [9]. Akin to its counterparts, Bard AI, the brainchild of Google, assumes its stance as a formidable entity within the domain of AI models, having undergone rigorous training on an expansive corpus encompassing textual and code-oriented knowledge culled from diverse sources, including literary works and academic articles [10].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Navigating the Landscape of Personalized Medicine: The Relevance of ChatGPT, BingChat, and Bard AI in Nephrology Literature Searches

Aiumtrakul,

Thongprayoon,

Suppadungsuk

et al. 2023

JPM

View full text Add to dashboard Cite

Background and Objectives: Literature reviews are foundational to understanding medical evidence. With AI tools like ChatGPT, Bing Chat and Bard AI emerging as potential aids in this domain, this study aimed to individually assess their citation accuracy within Nephrology, comparing their performance in providing precise. Materials and Methods: We generated the prompt to solicit 20 references in Vancouver style in each 12 Nephrology topics, using ChatGPT, Bing Chat and Bard. We verified the existence and accuracy of the provided references using PubMed, Google Scholar, and Web of Science. We categorized the validity of the references from the AI chatbot into (1) incomplete, (2) fabricated, (3) inaccurate, and (4) accurate. Results: A total of 199 (83%), 158 (66%) and 112 (47%) unique references were provided from ChatGPT, Bing Chat and Bard, respectively. ChatGPT provided 76 (38%) accurate, 82 (41%) inaccurate, 32 (16%) fabricated and 9 (5%) incomplete references. Bing Chat provided 47 (30%) accurate, 77 (49%) inaccurate, 21 (13%) fabricated and 13 (8%) incomplete references. In contrast, Bard provided 3 (3%) accurate, 26 (23%) inaccurate, 71 (63%) fabricated and 12 (11%) incomplete references. The most common error type across platforms was incorrect DOIs. Conclusions: In the field of medicine, the necessity for faultless adherence to research integrity is highlighted, asserting that even small errors cannot be tolerated. The outcomes of this investigation draw attention to inconsistent citation accuracy across the different AI tools evaluated. Despite some promising results, the discrepancies identified call for a cautious and rigorous vetting of AI-sourced references in medicine. Such chatbots, before becoming standard tools, need substantial refinements to assure unwavering precision in their outputs.

show abstract

“…A search on PubMed for "chatgpt [Title/Abstract] AND audiology [Title/Abstract]" returned no results, compared with 35 papers found for otolaryngology and even more in fields like dermatology and ophthalmology (as of April 5, 2024). Preliminary studies in audiology suggest that while ChatGPT, alongside other chatbots like Google Bard (now Gemini) and Bing Chat (now Copilot), shows promise, it also exhibits errors and inaccuracies that underscore the need for careful oversight when used in specialized fields [9]. This is particularly evident in some audiology subtopics such as tinnitus, where the responses, although quite impressive, occasionally stray from the topic and, crucially, totally lack citations [10].…”

Section: Introductionmentioning

confidence: 99%

Accuracy and Repeatability of ChatGPT Based on a Set of Multiple-Choice Questions on Objective Tests of Hearing

Kochanek,

Skarzynski,

Jedrzejczak

2024

Cureus

Self Cite

View full text Add to dashboard Cite

Introduction: ChatGPT has been tested in many disciplines, but only a few have involved hearing diagnosis and none to physiology or audiology more generally. The consistency of the chatbot's responses to the same question posed multiple times has not been well investigated either. This study aimed to assess the accuracy and repeatability of ChatGPT 3.5 and 4 on test questions concerning objective measures of hearing. Of particular interest was the short-term repeatability of responses which was here tested on four separate days extended over one week.Methods: We used 30 single-answer, multiple-choice exam questions from a one-year course on objective methods of testing hearing. The questions were posed five times to both ChatGPT 3.5 (the free version) and ChatGPT 4 (the paid version) on each of four days (two days one week and two days the following week). The accuracy of the responses was evaluated in terms of a response key. To evaluate the repeatability of the responses over time, percent agreement and Cohen's Kappa were calculated.Results: The overall accuracy of ChatGPT 3.5 was 48-49%, while that of ChatGPT 4 was 65-69%. ChatGPT 3.5 consistently failed to pass the threshold of 50% correct responses. Within a single day, the percent agreement was 76-79% for ChatGPT 3.5 and 87-88% for ChatGPT 4 (Cohen's Kappa 0.67-0.71 and 0.81-0.84 respectively). The percent agreement between responses from different days was 75-79% for ChatGPT 3.5 and 85-88% for ChatGPT 4 (Cohen's Kappa 0.65-0.69 and 0.80-0.85 respectively). Conclusion: ChatGPT 4 outperforms ChatGPT 3.5 both in accuracy and higher repeatability over time. However, the great variability of the responses casts doubt on possible professional applications of both versions.

show abstract

Comparison of the audiological knowledge of three chatbots – ChatGPT, Bing Chat, and Bard

Cited by 5 publications

References 25 publications

ChatGPT for Tinnitus Information and Support: Response Accuracy and Retest after Three and Six Months

ChatGPT for Tinnitus Information and Support: Response Accuracy and Retest after Three and Six Months

Navigating the Landscape of Personalized Medicine: The Relevance of ChatGPT, BingChat, and Bard AI in Nephrology Literature Searches

Accuracy and Repeatability of ChatGPT Based on a Set of Multiple-Choice Questions on Objective Tests of Hearing

Contact Info

Product

Resources

About