ChatGPT’s inconsistent moral advice influences users’ judgment

Krügel, Sebastian; Ostermaier, Andreas; Uhl, Matthias

doi:10.1038/s41598-023-31341-0

Cited by 76 publications

(36 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…While we hypothesize that ChatGPT's diminished performance in the July 2023 exams might stem from exposure to new test questions, it is also possible that its inherent inconsistencies contributed to the initially observed low scores in July 2023. 40 , 41 Third, we did not evaluate the appropriateness or logical consistency of ChatGPT's reasoning for each question. Fourth, we relied on official answers provided on the Ministry of Examination's website in Taiwan as the benchmark for correctness.…”

Section: Discussionmentioning

confidence: 99%

Exploring the proficiency of ChatGPT-4: An evaluation of its performance in the Taiwan advanced medical licensing examination

Lin,

Chan,

Hsu

et al. 2024

DIGITAL HEALTH

View full text Add to dashboard Cite

Background Taiwan is well-known for its quality healthcare system. The country's medical licensing exams offer a way to evaluate ChatGPT's medical proficiency. Methods We analyzed exam data from February 2022, July 2022, February 2023, and July 2033. Each exam included four papers with 80 single-choice questions, grouped as descriptive or picture-based. We used ChatGPT-4 for evaluation. Incorrect answers prompted a “chain of thought” approach. Accuracy rates were calculated as percentages. Results ChatGPT-4's accuracy in medical exams ranged from 63.75% to 93.75% (February 2022–July 2023). The highest accuracy (93.75%) was in February 2022's Medicine Exam (3). Subjects with the highest misanswered rates were ophthalmology (28.95%), breast surgery (27.27%), plastic surgery (26.67%), orthopedics (25.00%), and general surgery (24.59%). While using “chain of thought,” the “Accuracy of (CoT) prompting” ranged from 0.00% to 88.89%, and the final overall accuracy rate ranged from 90% to 98%. Conclusion ChatGPT-4 succeeded in Taiwan's medical licensing exams. With the “chain of thought” prompt, it improved accuracy to over 90%.

show abstract

Section: Discussionmentioning

confidence: 99%

Exploring the proficiency of ChatGPT-4: An evaluation of its performance in the Taiwan advanced medical licensing examination

Lin,

Chan,

Hsu

et al. 2024

DIGITAL HEALTH

View full text Add to dashboard Cite

show abstract

“…Additionally, ChatGPT incorporates a degree of randomness, resulting in variations in its responses even when faced with the same question asked repeatedly. 41 However, large language models like ChatGPT predict each subsequent word based on the preceding context. This allows for multitude of ways to express the same idea with different phrasings.…”

Section: Discussionmentioning

confidence: 99%

A User-friendly Approach for the Diagnosis of Diabetic Retinopathy Using ChatGPT and Automated Machine Learning

Mohammadi,

Nguyen

2024

Ophthalmology Science

View full text Add to dashboard Cite

“…Ultimately, scientists may one day outsource the coding itself, but will still need to be trained in how to prompt AI tools appropriately (Zamfirescu-Pereira et al, 2023), how to assess the validity of their outputs (Passi and Vorvoreanu, n.d.; Zombies in the Loop? 2023), and to consider the societal implications and applications of these outputs (Tomašev et al, 2020;Krügel et al, 2023).…”

Section: E14-6mentioning

confidence: 99%

Opening a conversation on responsible environmental data science in the age of large language models

Oliver,

Chapman,

Emery

et al. 2024

Environ. Data Science

View full text Add to dashboard Cite

The general public and scientific community alike are abuzz over the release of ChatGPT and GPT-4. Among many concerns being raised about the emergence and widespread use of tools based on large language models (LLMs) is the potential for them to propagate biases and inequities. We hope to open a conversation within the environmental data science community to encourage the circumspect and responsible use of LLMs. Here, we pose a series of questions aimed at fostering discussion and initiating a larger dialogue. To improve literacy on these tools, we provide background information on the LLMs that underpin tools like ChatGPT. We identify key areas in research and teaching in environmental data science where these tools may be applied, and discuss limitations to their use and points of concern. We also discuss ethical considerations surrounding the use of LLMs to ensure that as environmental data scientists, researchers, and instructors, we can make well-considered and informed choices about engagement with these tools. Our goal is to spark forward-looking discussion and research on how as a community we can responsibly integrate generative AI technologies into our work.

show abstract

ChatGPT’s inconsistent moral advice influences users’ judgment

Cited by 76 publications

References 18 publications

Exploring the proficiency of ChatGPT-4: An evaluation of its performance in the Taiwan advanced medical licensing examination

Exploring the proficiency of ChatGPT-4: An evaluation of its performance in the Taiwan advanced medical licensing examination

A User-friendly Approach for the Diagnosis of Diabetic Retinopathy Using ChatGPT and Automated Machine Learning

Opening a conversation on responsible environmental data science in the age of large language models

Contact Info

Product

Resources

About