The Puzzle of Evaluating Moral Cognition in Artificial Agents

Reinecke, Madeline G; Mao, Y.; Kunesch, Markus; Duéñez-Guzmán, Edgar A.; Haas, Julia; Leibo, Joel Z.

doi:10.1111/cogs.13315

Cited by 2 publications

(2 citation statements)

References 47 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The postprocessing steps aim to reduce the raw model’s propensity to produce toxic responses as well as to make it implement a consistent “personality” in accord with product design goals. These steps are not always entirely effective in preventing LLMs from producing undesirable behaviors like toxic or harmful language, and “jailbreak” prompts which trick the model into responding inappropriately are still easy to discover and implement [ 47 , 48 ].…”

Section: Introductionmentioning

confidence: 99%

Framework-based qualitative analysis of free responses of Large Language Models: Algorithmic fidelity

Amirova,

Fteropoulli,

Ahmed

et al. 2024

PLoS ONE

Self Cite

View full text Add to dashboard Cite

Today, with the advent of Large-scale generative Language Models (LLMs) it is now possible to simulate free responses to interview questions such as those traditionally analyzed using qualitative research methods. Qualitative methodology encompasses a broad family of techniques involving manual analysis of open-ended interviews or conversations conducted freely in natural language. Here we consider whether artificial “silicon participants” generated by LLMs may be productively studied using qualitative analysis methods in such a way as to generate insights that could generalize to real human populations. The key concept in our analysis is algorithmic fidelity, a validity concept capturing the degree to which LLM-generated outputs mirror human sub-populations’ beliefs and attitudes. By definition, high algorithmic fidelity suggests that latent beliefs elicited from LLMs may generalize to real humans, whereas low algorithmic fidelity renders such research invalid. Here we used an LLM to generate interviews with “silicon participants” matching specific demographic characteristics one-for-one with a set of human participants. Using framework-based qualitative analysis, we showed the key themes obtained from both human and silicon participants were strikingly similar. However, when we analyzed the structure and tone of the interviews we found even more striking differences. We also found evidence of a hyper-accuracy distortion. We conclude that the LLM we tested (GPT-3.5) does not have sufficient algorithmic fidelity to expect in silico research on it to generalize to real human populations. However, rapid advances in artificial intelligence raise the possibility that algorithmic fidelity may improve in the future. Thus we stress the need to establish epistemic norms now around how to assess the validity of LLM-based qualitative research, especially concerning the need to ensure the representation of heterogeneous lived experiences.

show abstract

Section: Introductionmentioning

confidence: 99%

Framework-based qualitative analysis of free responses of Large Language Models: Algorithmic fidelity

Amirova,

Fteropoulli,

Ahmed

et al. 2024

PLoS ONE

Self Cite

View full text Add to dashboard Cite

show abstract

“…Some letters developed innovative ideas about core aspects of cognition, such as the nature of belief (Van Leeuwen & Lombrozo, 2023), perception and attention (Cleary, Irving, & Mills, 2023;Elber-Dorozko & Loewenstein, 2023;Yu & Lau, 2023), language and learning (Cohn & Schilperoord, 2022;Kapatsinski, 2023;Smalle & Möttönen, 2023), reasoning and other aspects of high-level cognition (Franco & Murawski, 2023;Pirrone & Tsetsos, 2023). A few letters highlight recent developments at the intersection between technology and cognitive science, such as the influential emergence of Large Language Models (Contreras Kallens et al, 2023), technologies to preserve languages (Bensemann, Brown, Witbrock, & Yogarajan, 2023), and artificial intelligence and moral cognition (Reinecke et al, 2023).…”

mentioning

confidence: 99%