Instilling moral value alignment by means of multi-objective reinforcement learning

Rodríguez-Soto, Manel; Serramia, Marc; López-Sánchez, Maite; Rodŕıguez-Aguilar, Juan A.

doi:10.1007/s10676-022-09635-0

Cited by 18 publications

(15 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Specification requires a normative decision of what 'moral AI' looks like. This can be turned into concrete design objectives and training tasks (e.g., Rodriguez-Soto et al, 2022). However, even AI systems trained according to ideal specifications can fail in practice.…”

Section: The Role Of Analysis In Ai Researchmentioning

confidence: 99%

Artificial moral cognition: Learning from developmental psychology

Weidinger¹,

Reinecke²,

Haas³

2022

Preprint

View full text Add to dashboard Cite

An artificial system that successfully performs cognitive tasks may pass tests of 'intelligence' but not yet operate in ways that are morally appropriate. An important step towards developing moral artificial intelligence (AI) is to build robust methods for assessing moral capacities in these systems. Here, we present a framework for analysing and evaluating moral capacities in AI systems, which decomposes moral capacities into tractable analytical targets and produces tools for measuring artificial moral cognition. We show that decomposing moral cognition in this way can shed light on the presence, scaffolding, and interdependencies of amoral and moral capacities in AI systems. Our analysis framework produces a virtuous circle, whereby developmental psychology can enhance how AI systems are built, evaluated, and iterated on as moral agents; and analysis of moral capacities in AI can generate new hypotheses surrounding mechanisms within the human moral mind.

show abstract

Section: The Role Of Analysis In Ai Researchmentioning

confidence: 99%

Artificial moral cognition: Learning from developmental psychology

Weidinger¹,

Reinecke²,

Haas³

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…An open question in AI research and development is how to represent and specify ethical choices and constraints for this class of technologies in computational terms [Ajmeri et al, 2020;Amodei et al, 2016;Awad et al, 2022;Dignum, 2017; Version with Appendix: https://arxiv.org/abs/2301.08491 Code: https://github.com/Liza-Tennant/moral choice dyadic Yu et al, 2018;Wallach, 2010]. In particular, there is an increasing interest in understanding how certain types of behavior and outcomes might emerge from the interactions of learning agents in artificial societies [ de Cote et al, 2006;Foerster et al, 2018;Hughes et al, 2018;Jaques et al, 2019;Leibo et al, 2017;McKee et al, 2020;Peysakhovich and Lerer, 2018a;Peysakhovich and Lerer, 2018b;Rodriguez-Soto et al, 2021;Sandholm and Crites, 1996] and in interactive systems where humans are in the loop [Carroll et al, 2019;Rahwan et al, 2019]. We believe that a promising and insightful starting point is the analysis of emergent behavior of Reinforcement Learning (RL) agents that act according to a predefined set of moral rewards in situations were there is tension between individual interest and collective social outcomes, namely in social dilemmas [Axelrod and Hamilton, 1981;Rapoport, 1974;Sigmund, 2010].…”

Section: Introductionmentioning

confidence: 99%

“…We map choices to an intrinsic reward system [Chentanez et al, 2004] according to these ethical frameworks. The majority of existing modeling work has focused on single types of social preference [Hughes et al, 2018;Jaques et al, 2019;Kleiman-Weiner et al, 2017;Peysakhovich and Lerer, 2018b;Peysakhovich and Lerer, 2018a]. However, as suggested by the continued debate between the three moral frameworks [Mabille and Stoker, 2021], and by evidence from human moral psychology [Bentahila et al, 2021;Graham et al, 2009], a broad range of moral preferences are likely to exist within and across societies (especially in preferences regarding AI morality [Awad et al, 2018]).…”

Section: Introductionmentioning

confidence: 99%

“…The majority of existing modeling work has focused on single types of social preference [Hughes et al, 2018;Jaques et al, 2019;Kleiman-Weiner et al, 2017;Peysakhovich and Lerer, 2018b;Peysakhovich and Lerer, 2018a]. However, as suggested by the continued debate between the three moral frameworks [Mabille and Stoker, 2021], and by evidence from human moral psychology [Bentahila et al, 2021;Graham et al, 2009], a broad range of moral preferences are likely to exist within and across societies (especially in preferences regarding AI morality [Awad et al, 2018]). Thus, implementing any one theory top-down without consideration of how it might interact with other types of moral reasoning risks creating societies in which exploitative and/or defenseless behavior emerges [Wallach and Allen, 2009].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Modeling Moral Choices in Social Dilemmas with Multi-Agent Reinforcement Learning

Tennant

Hailes

Musolesi

2023

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence

View full text Add to dashboard Cite

Practical uses of Artificial Intelligence (AI) in the real world have demonstrated the importance of embedding moral choices into intelligent agents. They have also highlighted that defining top-down ethical constraints on AI according to any one type of morality is extremely challenging and can pose risks. A bottom-up learning approach may be more appropriate for studying and developing ethical behavior in AI agents. In particular, we believe that an interesting and insightful starting point is the analysis of emergent behavior of Reinforcement Learning (RL) agents that act according to a predefined set of moral rewards in social dilemmas. In this work, we present a systematic analysis of the choices made by intrinsically-motivated RL agents whose rewards are based on moral theories. We aim to design reward structures that are simplified yet representative of a set of key ethical systems. Therefore, we first define moral reward functions that distinguish between consequence- and norm-based agents, between morality based on societal norms or internal virtues, and between single- and mixed-virtue (e.g., multi-objective) methodologies. Then, we evaluate our approach by modeling repeated dyadic interactions between learning moral agents in three iterated social dilemma games (Prisoner's Dilemma, Volunteer's Dilemma and Stag Hunt). We analyze the impact of different types of morality on the emergence of cooperation, defection or exploitation, and the corresponding social outcomes. Finally, we discuss the implications of these findings for the development of moral agents in artificial and mixed human-AI societies.

show abstract

“…Our proposal ensures the conversational agent learns to behave ethically by applying ethical embedding, a reinforcement learning approach (see e.g., [18]). This methodology for instilling moral value alignment is founded in the framework of Multi-Objective Reinforcement Learning [20] and the philosophical consideration of values [3] as ethical principles that discern good from bad, and express what ought to be promoted.…”

Section: Introductionmentioning

confidence: 99%

An Ethical Conversational Agent to Respectfully Conduct In-Game Surveys

Roselló-Marín

López-Sánchez

Rodríguez

et al. 2022

Frontiers in Artificial Intelligence and Applications

View full text Add to dashboard Cite

The improvement of videogames highly relies on feedback, usually gathered through UX questionnaires performed after playing. However, users may not remember all the details. This paper proposes an ethical conversational agent, endowed with the moral value of respect, that interacts with the user to perform a survey during the game session. To do so, we use reinforcement learning and the ethical embedding algorithm to ensure that the agent learns to be respectful (i.e., avoid gameplay interruptions) while pursuing its individual objective of asking questions. The novelty is twofold: firstly, the application of ethical embedding outside toy problems; and secondly, the enrichment of a survey oriented conversational agent with this moral value of respect. Results showcase how our ethical conversational bot manages to avoid disturbing user’s engagement while getting even a higher percentage of valid answers than a non-ethically enriched chatbot.

show abstract

Instilling moral value alignment by means of multi-objective reinforcement learning

Cited by 18 publications

References 29 publications

Artificial moral cognition: Learning from developmental psychology

Artificial moral cognition: Learning from developmental psychology

Modeling Moral Choices in Social Dilemmas with Multi-Agent Reinforcement Learning

An Ethical Conversational Agent to Respectfully Conduct In-Game Surveys

Contact Info

Product

Resources

About