2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL) 2014
DOI: 10.1109/adprl.2014.7010622
|View full text |Cite
|
Sign up to set email alerts
|

Model-based multi-objective reinforcement learning

Abstract: Abstract-This paper describes a novel multi-objective reinforcement learning algorithm. The proposed algorithm first learns a model of the multi-objective sequential decision making problem, after which this learned model is used by a multiobjective dynamic programming method to compute Pareto optimal policies. The advantage of this model-based multi-objective reinforcement learning method is that once an accurate model has been estimated from the experiences of an agent in some environment, the dynamic progra… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
16
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 34 publications
(16 citation statements)
references
References 15 publications
0
16
0
Order By: Relevance
“…Otherwise an action is selected randomly. This has been the predominant exploration approach adopted in the MORL literature so far [12,15,16,19,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45].…”
Section: Exploration In Multiobjective Rlmentioning
confidence: 99%
See 1 more Smart Citation
“…Otherwise an action is selected randomly. This has been the predominant exploration approach adopted in the MORL literature so far [12,15,16,19,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45].…”
Section: Exploration In Multiobjective Rlmentioning
confidence: 99%
“…The DST has been widely adopted as a benchmark (e.g. [15,32,43,44]). The agent controls a submarine which starts from a location near the shore and travels out to sea to retrieve treasure.…”
Section: Deep Sea Treasurementioning
confidence: 99%
“…Our current implementation uses a very simple scalarization method to solve a multi-objective problem. There are many techniques designed to allow agents to more easily solve multi-objective problems [33], some of which might be used to enhance the performance of our controller. [34] Currently, our reward is a linear combination of a set of soft constraints, multiplied by the AND-operation of all hard constraints.…”
Section: Multi Objective Reinforcement Learningmentioning
confidence: 99%
“…Wiering et aluse a two-stage approach to learn the set of optimal policies [33] that are applicable in the Deep Sea Treasure problem. First, an agent explores the environment, attempting to explore and learn a model of the environment.…”
Section: Multi Objective Reinforcement Learningmentioning
confidence: 99%
“…However, the planning methods and learning methods are not entirely disjoint; when the agent explicitly learns a model of the environment through its interaction, it can use a planning method in order to produce a coverage set. Such model-based learning has been investigated extensively in single-objective settings, and has recently been introduced to multi-objective settings as well (Wiering et al, 2014). As such, the methods proposed in this dissertation can be employed as planning subroutines inside a model-based learning algorithm.…”
mentioning
confidence: 99%