2017
DOI: 10.1016/j.neucom.2016.08.152
|View full text |Cite
|
Sign up to set email alerts
|

Steering approaches to Pareto-optimal multiobjective reinforcement learning

Abstract: For reinforcement learning tasks with multiple objectives, it may be advantageous to learn stochastic or non-stationary policies. This paper investigates two novel algorithms for learning non-stationary policies which produce Pareto-optimal behaviour (w-steering and Q-steering), by extending prior work based on the concept of geometric steering. Empirical results demonstrate that both new algorithms offer substantial performance improvements over stationary deterministic policies, while Q-steering significantl… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
20
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
7

Relationship

6
1

Authors

Journals

citations
Cited by 25 publications
(20 citation statements)
references
References 23 publications
0
20
0
Order By: Relevance
“…Otherwise an action is selected randomly. This has been the predominant exploration approach adopted in the MORL literature so far [12,15,16,19,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45].…”
Section: Exploration In Multiobjective Rlmentioning
confidence: 99%
“…Otherwise an action is selected randomly. This has been the predominant exploration approach adopted in the MORL literature so far [12,15,16,19,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45].…”
Section: Exploration In Multiobjective Rlmentioning
confidence: 99%
“…In this issue, all the papers use benchmark environments with two or three objectives. The Deep Sea Treasure task [2,3,6] is a bi-objective environment consisting of ten Pareto-optimal states, which has often been used for testing MORL algorithms. The Bonus World used in [7] is an original three objective environment.…”
mentioning
confidence: 99%
“…The Bonus World used in [7] is an original three objective environment. Another bi-objective environment that has been used to evaluate a novel multi-objective RL algorithm is the Linked Rings problem [3]. Some of the used environments consist of continuous state variables.…”
mentioning
confidence: 99%
See 2 more Smart Citations