2021
DOI: 10.48550/arxiv.2106.08199
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

On Multi-objective Policy Optimization as a Tool for Reinforcement Learning

Abstract: Many advances that have improved the robustness and efficiency of deep reinforcement learning (RL) algorithms can, in one way or another, be understood as introducing additional objectives, or constraints, in the policy optimization step. This includes ideas as far ranging as exploration bonuses, entropy regularization, and regularization toward teachers or data priors when learning from experts or in offline RL. Often, task reward and auxiliary objectives are in conflict with each other and it is therefore na… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
10
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
3

Relationship

1
5

Authors

Journals

citations
Cited by 7 publications
(10 citation statements)
references
References 21 publications
0
10
0
Order By: Relevance
“…This is particularly concerning given that Silver et al are highly influential researchers and employed at DeepMind, one of the organisations best equipped to expand the frontiers of AGI. While Silver et al "hope that other researchers will join us on our quest", we instead hope that the creation of AGI based on reward maximisation is tempered by other researchers with an understanding of the issues of AI safety [45,47] and an appreciation of the benefits of multi-objective agents [1,2].…”
Section: Discussionmentioning
confidence: 99%
“…This is particularly concerning given that Silver et al are highly influential researchers and employed at DeepMind, one of the organisations best equipped to expand the frontiers of AGI. While Silver et al "hope that other researchers will join us on our quest", we instead hope that the creation of AGI based on reward maximisation is tempered by other researchers with an understanding of the issues of AI safety [45,47] and an appreciation of the benefits of multi-objective agents [1,2].…”
Section: Discussionmentioning
confidence: 99%
“…[9], [10], [11], [12], [13], [14]. While most studies only consider learning over simulated environments and lowdimensional state features, recent studies that do show successful learning from high-dimensional observations (such as images) are actor-critic algorithms with constraints or regularization on the policy [10], [9], [12], [15]. Our methods build on some of these: the exponential advantage-weighted actor-critic formulation from CRR [10] and AWAC [9].…”
Section: Related Workmentioning
confidence: 99%
“…[21] for an overview) in which the RL objective is augmented to maximize reward while staying close to a prior policy. This prior can be instantiated with a suboptimal teacher policy, which is often used for transfer learning [22], [15].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…In [3], DQNs are again revisited with a focus on generalization across domains. MORL is cast as a mixture of expert synthesis problems with behaviour cloning in [17].…”
Section: Related Workmentioning
confidence: 99%