2023
DOI: 10.1101/2023.07.07.23292385
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Appropriateness of ChatGPT in answering heart failure related questions

Abstract: Background Heart failure requires complex management with increased patient knowledge shown to improve outcomes. The large language model (LLM), Chat Generative Pre-Trained Transformer (ChatGPT), may be a useful supplemental resource of information for patients with heart failure. Methods Responses produced by GPT-3.5 and GPT-4 to 107 frequently asked heart failure-related questions were graded by two reviewers board-certified in cardiology, with differences resolved by a third reviewer. The reproducibility an… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
10
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 11 publications
(10 citation statements)
references
References 12 publications
0
10
0
Order By: Relevance
“…One study (n=1/89, 1.1%) was published in 2022 24 , 84 (n=84/89, 94.4%) in 2023 13,25107 , and 4 (n=4/89, 4.5%) in 2024 108111 (all of which were peer-reviewed publications of preprints published in 2023). Most studies were quantitative non-randomized (n=84/89, 94.4%) 13,2527,29101,103,104,106,107,109111 , 4 (n=4/89, 4.5%) 28,102,105,108 had a qualitative study design, and one (n=1/89, 1.1%) 24 was quantitative randomized according to the MMAT 2018 criteria. However, the LLM outputs were often first analyzed quantitatively but followed by a qualitative analysis of certain responses.…”
Section: Resultsmentioning
confidence: 99%
See 4 more Smart Citations
“…One study (n=1/89, 1.1%) was published in 2022 24 , 84 (n=84/89, 94.4%) in 2023 13,25107 , and 4 (n=4/89, 4.5%) in 2024 108111 (all of which were peer-reviewed publications of preprints published in 2023). Most studies were quantitative non-randomized (n=84/89, 94.4%) 13,2527,29101,103,104,106,107,109111 , 4 (n=4/89, 4.5%) 28,102,105,108 had a qualitative study design, and one (n=1/89, 1.1%) 24 was quantitative randomized according to the MMAT 2018 criteria. However, the LLM outputs were often first analyzed quantitatively but followed by a qualitative analysis of certain responses.…”
Section: Resultsmentioning
confidence: 99%
“…The authors were primarily affiliated with institutions in the United States (n=47 of 122 different countries identified per publication, 38.5%), followed by Germany (n=11/122, 9%), Turkey (n=7/122, 5.7%), the United Kingdom (n=6/122, 4.9%), China/Australia/Italy (n=5/122, 4.1%, respectively), and 24 (n=36/122, 29.5%) other countries. Most studies examined one or more applications based on the GPT-3.5 architecture (n=66 of 124 different LLMs examined per study, 53.2%) 13,2629,3134,3640,4249,5254,5661,63,6567,71,72,74,75,77,78,8189,91,92,94,95,97100,102–104,106109,111 , followed by GPT-4 (n=33/124, 26.6%) 13,25,27,29,30,3436,41,43,50,51,54,55,58,61,64,6870,74,76,7981,83,87,89,90,93,96,98,99,101,105 , Bard (n=10/124, 8.1%; now known as Gemini) 33,48,49,55,73,74,80,87,94,99 , Bing Chat (n=7/124, 5.7%; now Microsoft Copilot) 49,51,55,73,94,99,110 , and other applications based on Bidirectional Encoder Representations from Transformers (BERT; n=4/124, 3...…”
Section: Resultsmentioning
confidence: 99%
See 3 more Smart Citations