ChatGPT: Jack of all trades, master of none

Kocoń, Jan; Cichecki, Igor; Kaszyca, Oliwier; Kochanek, Mateusz; Szydło, Dominika; Baran, Joanna; Bielaniewicz, Julita; Gruza, Marcin; Janz, Arkadiusz; Kanclerz, Kamil; Kocoń, A.; Koptyra, Bartłomiej; Mieleszczenko-Kowszewicz, Wiktoria; Miłkowski, Piotr; Oleksy, Marcin; Piasecki, Maciej; Radliński, Łukasz; Wojtasik, Konrad; Woźniak, S.; Kazienko, Przemysław

doi:10.1016/j.inffus.2023.101861

Cited by 257 publications

(55 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In the most immediate future, it is likely that AI will be used to at least partially replace subjective human coders for the purpose of inferring psychologically and socially meaningful phenomena from data. Today, research demonstrates the degree to which AI and humans judge content similarly (Gilardi et al, 2023;Kocoń et al, 2023;Zambrano et al, 2023), strongly implying the most immediate areas of interest and excitement for how AI can be used to automate highly common, but tedious, tasks that have already long been a part of the "human only" territory in the social science workflow.…”

Section: Discussionmentioning

confidence: 99%

From silicon to solutions: AI's impending impact on research and discovery

Markowitz,

Boyd,

Blackburn

2024

Front. Soc. Psychol.

View full text Add to dashboard Cite

The social sciences have long relied on comparative work as the foundation upon which we understand the complexities of human behavior and society. However, as we go deeper into the era of artificial intelligence (AI), it becomes imperative to move beyond mere comparison (e.g., how AI compares to humans across a range of tasks) to establish a visionary agenda for AI as collaborative partners in the pursuit of knowledge and scientific inquiry. This paper articulates an agenda that envisions AI models as the preeminent scientific collaborators. We advocate for the profound notion that our thinking should evolve to anticipate, and include, AI models as one of the most impactful tools in the social scientist's toolbox, offering assistance and collaboration with low-level tasks (e.g., analysis and interpretation of research findings) and high-level tasks (e.g., the discovery of new academic frontiers) alike. This transformation requires us to imagine AI's possible/probable roles in the research process. We defend the inevitable benefits of AI as knowledge generators and research collaborators—agents who facilitate the scientific journey, aiming to make complex human issues more tractable and comprehensible. We foresee AI tools acting as co-researchers, contributing to research proposals and driving breakthrough discoveries. Ethical considerations are paramount, encompassing democratizing access to AI tools, fostering interdisciplinary collaborations, ensuring transparency, fairness, and privacy in AI-driven research, and addressing limitations and biases in large language models. Embracing AI as collaborative partners will revolutionize the landscape of social sciences, enabling innovative, inclusive, and ethically sound research practices.

show abstract

Section: Discussionmentioning

confidence: 99%

From silicon to solutions: AI's impending impact on research and discovery

Markowitz,

Boyd,

Blackburn

2024

Front. Soc. Psychol.

View full text Add to dashboard Cite

show abstract

“…In contrast, the current roles attributed to chatGPT appear to surpass some human cognitive abilities (Koubaa et al, 2023), being able to tackle various technical domains with great precision in its responses (Orrù et al, 2023;Susnjak, 2022;Urban et al, 2023), even displaying signs of artificial general intelligence (Bubeck et al, 2023). Indeed, in a study, ChatGPT was compared with state-of-the-art algorithms, and it was demonstrated that ChatGPT exhibited up to a 25% lower performance for zero-shot and few-shot evaluation (Kocoń et al, 2023). Furthermore, interactions with chatGPT take place via "prompts," which are written verbal texts designed for communicative purposes, often encompassing instructions that chatGPT is compelled to adhere to (White et al, 2023).…”

Section: Cultural Modulation Of Neural Mechanisms Underlying Cognitionmentioning

confidence: 99%

Potential cognitive risks of generative transformer-based AI chatbots on higher order executive functions.

León-Domínguez

2024

Neuropsychology

View full text Add to dashboard Cite

Background: Chat generative retrained transformer (ChatGPT) represents a groundbreaking advancement in Artificial Intelligence (AI-chatbot) technology, utilizing transformer algorithms to enhance natural language processing and facilitating their use for addressing specific tasks. These AI chatbots can respond to questions by generating verbal instructions similar to those a person would provide during the problemsolving process. Aim: ChatGPT has become the fastest growing software in terms of user adoption in history, leading to an anticipated widespread use of this technology in the general population. Current literature is predominantly focused on the functional aspects of these technologies, but the field has not yet explored hypotheses on how these AI chatbots could impact the evolutionary aspects of human cognitive development. Thesis: The "neuronal recycling hypothesis" posits that the brain undergoes structural transformation by incorporating new cultural tools into "neural niches," consequently altering individual cognition. In the case of technological tools, it has been established that they reduce the cognitive demand needed to solve tasks through a process called "cognitive offloading." In this theoretical article, three hypotheses were proposed via forward inference about how algorithms such as ChatGPT and similar models may influence the cognitive processes and structures of upcoming generations. Conclusions: By forecasting the neurocognitive effects of these technologies, educational and political communities can anticipate future scenarios and formulate strategic plans to either mitigate or enhance the cognitive influence that these factors may have on the general population. Key PointsQuestion: Can the constant and pervasive use of AI chatbots alter our cognitive development? Findings: The pervasive use of AI chatbots may impair the efficiency of higher cognitive functions, such as problem-solving. Importance: Anticipating AI chatbots' impact on human cognition enables the development of interventions to counteract potential negative effects. Next Steps: Design and execute experimental studies investigating the positive and negative effects of AI chatbots on human cognition.

show abstract

“…ChatGPT has been described as a "jack of all trades, master of none". [9] Nonetheless, it is already being explored by doctors and patients as an adjunct to the traditional medical pathway. [10][11][12][13][14] Ethical concerns regarding this technology are more prevalent than ever, encompassing issues of bias, information governance, patient confidentiality, transparency and accountability.…”

Section: Introductionmentioning

confidence: 99%

Exploring the Capabilities of ChatGPT in Women’s Health

Bachmann,

Duta,

Mazey

et al. 2024

Preprint

View full text Add to dashboard Cite

Introduction: Artificial Intelligence (AI) is redefining healthcare, with Large Language Models (LLMs) like ChatGPT offering novel and powerful capabilities in processing and generating human-like information. These advancements offer potential improvements in Women's Health, particularly Obstetrics and Gynaecology (O&G), where diagnostic and treatment gaps have long existed. Despite its generalist nature, ChatGPT is increasingly being tested in healthcare, necessitating a critical analysis of its utility, limitations and safety. This study examines ChatGPT's performance in interpreting and responding to international gold standard benchmark assessments in O&G: the RCOG's MRCOG Part One and Two examinations. We evaluate ChatGPT's domain- and knowledge area-specific accuracy, the influence of linguistic complexity on performance and its self-assessment confidence and uncertainty, essential for safe clinical decision-making. Methods: A dataset of MRCOG examination questions from sources beyond the reach of LLMs was developed to mitigate the risk of ChatGPT's prior exposure. A dual-review process validated the technical and clinical accuracy of the questions, omitting those dependent on previous content, duplicates, or requiring image interpretation. Single Best Answer (SBA) and Extended Matching (EMQ) Questions were converted to JSON format to facilitate ChatGPT's interpretation, incorporating question types and background information. Interaction with ChatGPT was conducted via OpenAI's API, structured to ensure consistent, contextually informed responses from ChatGPT. The response from ChatGPT was recorded and compared against the known accurate response. Linguistic complexity was evaluated using unique token counts and Type-Token ratios (vocabulary breadth and diversity) to explore their influence on performance. ChatGPT was instructed to assign confidence scores to its answers (0-100%), reflecting its self-perceived accuracy. Responses were categorized by correctness and statistically analysed through entropy calculation, assessing ChatGPT's capacity for self-evaluating certainty and knowledge boundaries. Findings: Of 1,824 MRCOG Part One and Two questions, ChatGPT's accuracy on MRCOG Part One was 72.2% (95% CI 69.2-75.3). For Part Two, it achieved 50.4% accuracy (95% CI 47.2-53.5) with 534 correct out of 989 questions, performing better on SBAs (54.0%, 95% CI 50.0-58.0) than on EMQs (45.0%, 95% CI 40.1-49.9). In domain-specific performance, the highest accuracy was in Biochemistry (79.8%, 95% CI 71.4-88.1) and the lowest in Biophysics (51.4%, 95% CI 35.2-67.5). The best-performing subject in Part Two was Urogynaecology (63.0%, 95% CI 50.1-75.8) and the worst was Management of Labour (35.6%, 95% CI 21.6-49.5). Linguistic complexity analysis showed a marginal increase in unique token count for correct answers in Part One (median 122, IQR 114-134) compared to incorrect (median 120, IQR 112-131, p=0.05). TTR analysis revealed higher medians for correct answers with negligible effect sizes (Part One: 0.66, IQR 0.63-0.68; Part Two: 0.62, IQR 0.57-0.67) and p-values <0.001. Regarding self-assessed confidence, the median confidence for correct answers was 70.0% (IQR 60-90), the same as for incorrect choices identified as correct (p<0.001). For correct answers deemed incorrect, the median confidence was 10.0% (IQR 0-10), and for incorrect answers accurately identified, it was 5.0% (IQR 0-10, p<0.001). Entropy values were identical for correct and incorrect responses (median 1.46, IQR 0.44-1.77), indicating no discernible distinction in ChatGPT's prediction certainty. Conclusions: ChatGPT demonstrated commendable accuracy in basic medical queries on the MRCOG Part One, yet its performance was markedly reduced in the clinically demanding Part Two exam. The model's high self-confidence across correct and incorrect responses necessitates scrutiny for its application in clinical decision-making. These findings suggest that while ChatGPT has potential, its current form requires significant refinement before it can enhance diagnostic efficacy and clinical workflow in women's health.

show abstract

ChatGPT: Jack of all trades, master of none

Cited by 257 publications

References 13 publications

From silicon to solutions: AI's impending impact on research and discovery

From silicon to solutions: AI's impending impact on research and discovery

Potential cognitive risks of generative transformer-based AI chatbots on higher order executive functions.

Exploring the Capabilities of ChatGPT in Women’s Health

Contact Info

Product

Resources

About