Search citation statements
Paper Sections
Citation Types
Year Published
Publication Types
Relationship
Authors
Journals
BackgroundThe field of artificial intelligence is rapidly evolving. As an easily accessible platform with vast user engagement, the Chat Generative Pre‐Trained Transformer (ChatGPT) holds great promise in medicine, with the latest version, GPT‐4, capable of analyzing clinical images.ObjectivesTo evaluate ChatGPT as a diagnostic tool and information source in clinical dermatology.MethodsA total of 15 clinical images were selected from the Danish web atlas, Danderm, depicting various common and rare skin conditions. The images were uploaded to ChatGPT version GPT‐4, which was prompted with ‘Please provide a description, a potential diagnosis, and treatment options for the following dermatological condition’. The generated responses were assessed by senior registrars in dermatology and consultant dermatologists in terms of accuracy, relevance, and depth (scale 1–5), and in addition, the image quality was rated (scale 0–10). Demographic and professional information about the respondents was registered.ResultsA total of 23 physicians participated in the study. The majority of the respondents were consultant dermatologists (83%), and 48% had more than 10 years of training. The overall image quality had a median rating of 10 out of 10 [interquartile range (IQR): 9–10]. The overall median rating of the ChatGPT generated responses was 2 (IQR: 1–4), while overall median ratings in terms of relevance, accuracy, and depth were 2 (IQR: 1–4), 3 (IQR: 2–4) and 2 (IQR: 1–3), respectively.ConclusionsDespite the advancements in ChatGPT, including newly added image processing capabilities, the chatbot demonstrated significant limitations in providing reliable and clinically useful responses to illustrative images of various dermatological conditions.
BackgroundThe field of artificial intelligence is rapidly evolving. As an easily accessible platform with vast user engagement, the Chat Generative Pre‐Trained Transformer (ChatGPT) holds great promise in medicine, with the latest version, GPT‐4, capable of analyzing clinical images.ObjectivesTo evaluate ChatGPT as a diagnostic tool and information source in clinical dermatology.MethodsA total of 15 clinical images were selected from the Danish web atlas, Danderm, depicting various common and rare skin conditions. The images were uploaded to ChatGPT version GPT‐4, which was prompted with ‘Please provide a description, a potential diagnosis, and treatment options for the following dermatological condition’. The generated responses were assessed by senior registrars in dermatology and consultant dermatologists in terms of accuracy, relevance, and depth (scale 1–5), and in addition, the image quality was rated (scale 0–10). Demographic and professional information about the respondents was registered.ResultsA total of 23 physicians participated in the study. The majority of the respondents were consultant dermatologists (83%), and 48% had more than 10 years of training. The overall image quality had a median rating of 10 out of 10 [interquartile range (IQR): 9–10]. The overall median rating of the ChatGPT generated responses was 2 (IQR: 1–4), while overall median ratings in terms of relevance, accuracy, and depth were 2 (IQR: 1–4), 3 (IQR: 2–4) and 2 (IQR: 1–3), respectively.ConclusionsDespite the advancements in ChatGPT, including newly added image processing capabilities, the chatbot demonstrated significant limitations in providing reliable and clinically useful responses to illustrative images of various dermatological conditions.
ObjectiveThe purpose of this study was to evaluate the performance of advanced large language models from OpenAI (GPT‐3.5 and GPT‐4), Google (PaLM2 and MedPaLM), and an open source model from Meta (Llama3:70b) in answering clinical test multiple choice questions in the field of otolaryngology—head and neck surgery.MethodsA dataset of 4566 otolaryngology questions was used; each model was provided a standardized prompt followed by a question. One hundred questions that were answered incorrectly by all models were further interrogated to gain insight into the causes of incorrect answers.ResultsGPT4 was the most accurate, correctly answering 3520 of 4566 questions (77.1%). MedPaLM correctly answered 3223 of 4566 (70.6%) questions, while llama3:70b, GPT3.5, and PaLM2 were correct on 3052 of 4566 (66.8%), 2672 of 4566 (58.5%), and 2583 of 4566 (56.5%) questions. Three hundred and sixty‐nine questions were answered incorrectly by all models. Prompts to provide reasoning improved accuracy in all models: GPT4 changed from incorrect to correct answer 31% of the time, while GPT3.5, Llama3, PaLM2, and MedPaLM corrected their responses 25%, 18%, 19%, and 17% of the time, respectively.ConclusionLarge language models vary in their understanding of otolaryngology‐specific clinical knowledge. OpenAI's GPT4 has a strong understanding of core concepts as well as detailed information in the field of otolaryngology. Its baseline understanding in this field makes it well‐suited to serve in roles related to head and neck surgery education provided that the appropriate precautions are taken and potential limitations are understood.Level of EvidenceN/A Laryngoscope, 2024
This review aims to provide a summary of all scientific publications on the use of large language models (LLMs) in medical education over the first year of their availability. A scoping literature review was conducted in accordance with the PRISMA recommendations for scoping reviews. Five scientific literature databases were searched using predefined search terms. The search yielded 1509 initial results, of which 145 studies were ultimately included. Most studies assessed LLMs’ capabilities in passing medical exams. Some studies discussed advantages, disadvantages, and potential use cases of LLMs. Very few studies conducted empirical research. Many published studies lack methodological rigor. We therefore propose a research agenda to improve the quality of studies on LLM.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.