2022
DOI: 10.1007/s10676-022-09635-0
|View full text |Cite
|
Sign up to set email alerts
|

Instilling moral value alignment by means of multi-objective reinforcement learning

Abstract: AI research is being challenged with ensuring that autonomous agents learn to behave ethically, namely in alignment with moral values. Here, we propose a novel way of tackling the value alignment problem as a two-step process. The first step consists on formalising moral values and value aligned behaviour based on philosophical foundations. Our formalisation is compatible with the framework of (Multi-Objective) Reinforcement Learning, to ease the handling of an agent’s individual and ethical objectives. The se… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
14
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 18 publications
(15 citation statements)
references
References 29 publications
1
14
0
Order By: Relevance
“…Specification requires a normative decision of what 'moral AI' looks like. This can be turned into concrete design objectives and training tasks (e.g., Rodriguez-Soto et al, 2022). However, even AI systems trained according to ideal specifications can fail in practice.…”
Section: The Role Of Analysis In Ai Researchmentioning
confidence: 99%
“…Specification requires a normative decision of what 'moral AI' looks like. This can be turned into concrete design objectives and training tasks (e.g., Rodriguez-Soto et al, 2022). However, even AI systems trained according to ideal specifications can fail in practice.…”
Section: The Role Of Analysis In Ai Researchmentioning
confidence: 99%
“…An open question in AI research and development is how to represent and specify ethical choices and constraints for this class of technologies in computational terms [Ajmeri et al, 2020;Amodei et al, 2016;Awad et al, 2022;Dignum, 2017; Version with Appendix: https://arxiv.org/abs/2301.08491 Code: https://github.com/Liza-Tennant/moral choice dyadic Yu et al, 2018;Wallach, 2010]. In particular, there is an increasing interest in understanding how certain types of behavior and outcomes might emerge from the interactions of learning agents in artificial societies [ de Cote et al, 2006;Foerster et al, 2018;Hughes et al, 2018;Jaques et al, 2019;Leibo et al, 2017;McKee et al, 2020;Peysakhovich and Lerer, 2018a;Peysakhovich and Lerer, 2018b;Rodriguez-Soto et al, 2021;Sandholm and Crites, 1996] and in interactive systems where humans are in the loop [Carroll et al, 2019;Rahwan et al, 2019]. We believe that a promising and insightful starting point is the analysis of emergent behavior of Reinforcement Learning (RL) agents that act according to a predefined set of moral rewards in situations were there is tension between individual interest and collective social outcomes, namely in social dilemmas [Axelrod and Hamilton, 1981;Rapoport, 1974;Sigmund, 2010].…”
Section: Introductionmentioning
confidence: 99%
“…We map choices to an intrinsic reward system [Chentanez et al, 2004] according to these ethical frameworks. The majority of existing modeling work has focused on single types of social preference [Hughes et al, 2018;Jaques et al, 2019;Kleiman-Weiner et al, 2017;Peysakhovich and Lerer, 2018b;Peysakhovich and Lerer, 2018a]. However, as suggested by the continued debate between the three moral frameworks [Mabille and Stoker, 2021], and by evidence from human moral psychology [Bentahila et al, 2021;Graham et al, 2009], a broad range of moral preferences are likely to exist within and across societies (especially in preferences regarding AI morality [Awad et al, 2018]).…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Our proposal ensures the conversational agent learns to behave ethically by applying ethical embedding, a reinforcement learning approach (see e.g., [18]). This methodology for instilling moral value alignment is founded in the framework of Multi-Objective Reinforcement Learning [20] and the philosophical consideration of values [3] as ethical principles that discern good from bad, and express what ought to be promoted.…”
Section: Introductionmentioning
confidence: 99%