When the results of the Goh et al study 1 were presented at a recent National Academies of Medicine meeting, the audience was amazed-and concerned. The randomized clinical trial assessed diagnostic performance by generalist physicians, who were asked to provide diagnoses for 6 simulated cases using either conventional online resources or a large language model (LLM) (ChatGPT Plus [GPT-4]; OpenAI) in addition to standard resources. The study also evaluated the ability of the LLM to solve the cases alone. The authors developed a rubric for measuring diagnostic performance in which blinded experts evaluated participants' overall clinical reasoning process, including their proposed final diagnosis, their differential diagnosis, and factors supporting or
+ Related articleAuthor affiliations and article information are listed at the end of this article.