2016
DOI: 10.1613/jair.4961
|View full text |Cite
|
Sign up to set email alerts
|

Multi-objective Reinforcement Learning through Continuous Pareto Manifold Approximation

Abstract: Many real-world control applications, from economics to robotics, are characterized by the presence of multiple conflicting objectives. In these problems, the standard concept of optimality is replaced by Pareto-optimality and the goal is to find the Pareto frontier, a set of solutions representing different compromises among the objectives. Despite recent advances in multi-objective optimization, achieving an accurate representation of the Pareto frontier is still an important challenge. In this paper, we pro… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
19
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
7
1
1

Relationship

1
8

Authors

Journals

citations
Cited by 38 publications
(19 citation statements)
references
References 49 publications
0
19
0
Order By: Relevance
“…Multi-policy approaches seek to find a set of policies that cover the Pareto front. Various techniques exist, for instance repeatedly calling a single-policy approach with strategically-chosen trade-off settings [41,27,55], simultaneously learning a set of policies by using a multi-objective variant of Q-learning [26,39,54], learning a manifold of policies in parameter space [32,33], or combining single-policy approaches with an overarching objective [53].…”
Section: Related Workmentioning
confidence: 99%
“…Multi-policy approaches seek to find a set of policies that cover the Pareto front. Various techniques exist, for instance repeatedly calling a single-policy approach with strategically-chosen trade-off settings [41,27,55], simultaneously learning a set of policies by using a multi-objective variant of Q-learning [26,39,54], learning a manifold of policies in parameter space [32,33], or combining single-policy approaches with an overarching objective [53].…”
Section: Related Workmentioning
confidence: 99%
“…In current research of multi-objective optimization (MOO) problems and multi-objective reinforcement learning (MORL), the standard concept of optimality is replaced by Pareto optimality [8,18,20]. In [18], Parisi et al formulated an RL strategy gradient method to learn Pareto boundary in multi-objective Markov decision problems, where the continuous approximation of Pareto boundary is generated for each gradient climb operation. In [20], Ruiz et al discussed a MORL method that uses non-convex Pareto boundaries to generates deterministic non-dominated strategies in multi-objective Markov decision problems.…”
Section: Related Workmentioning
confidence: 99%
“…Two new manifold-based algorithms that combine episodic exploration and importance sampling were proposed to efficiently learn a manifold in the policy parameter space such that its image in the objective space accurately approximates the Pareto frontier (Parisi, Pirotta, & Peters, 2017). There are also numerous other gradient-based methods to solve multi-objective optimization (Pirotta, Parisi, & Restelli, 2015;Parisi, Pirotta, & Restelli, 2016;Parisi, et al, 2014aParisi, et al, , 2014bPinder, 2016). For example, policy gradient techniques were developed to approximate the Pareto frontier in multi-objective Markov decision processes (Pirotta et al, 2015;Parisi et al, 2016Parisi et al, , 2014aParisi et al, , 2014b.…”
Section: Previous Workmentioning
confidence: 99%
“…There are also numerous other gradient-based methods to solve multi-objective optimization (Pirotta, Parisi, & Restelli, 2015;Parisi, Pirotta, & Restelli, 2016;Parisi, et al, 2014aParisi, et al, , 2014bPinder, 2016). For example, policy gradient techniques were developed to approximate the Pareto frontier in multi-objective Markov decision processes (Pirotta et al, 2015;Parisi et al, 2016Parisi et al, , 2014aParisi et al, , 2014b. Note that an explicit quantitative study of the interobjective relationship is also lacking in these studies.…”
Section: Previous Workmentioning
confidence: 99%