2015
DOI: 10.1609/aaai.v29i1.9647
|View full text |Cite
|
Sign up to set email alerts
|

Multi-Objective MDPs with Conditional Lexicographic Reward Preferences

Abstract: Sequential decision problems that involve multiple objectives are prevalent. Consider for example a driver of a semi-autonomous car who may want to optimize competing objectives such as travel time and the effort associated with manual driving. We introduce a rich model called Lexicographic MDP (LMDP) and a corresponding planning algorithm called LVI that generalize previous work by allowing for conditional lexicographic preferences with slack. We analyze the convergence characteristics of LVI and establish it… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
25
0

Year Published

2015
2015
2024
2024

Publication Types

Select...
5
4

Relationship

3
6

Authors

Journals

citations
Cited by 33 publications
(25 citation statements)
references
References 18 publications
0
25
0
Order By: Relevance
“…We formalise this as a multi-objective Markov decision process (MOMDP). We note that more complex models exist, such as a multi-objective partially observable Markov decision process [110,160,161,202] and multi-objective multi-agent systems [126]. However, the MOMDP formalisation allows us to study many relevant aspects of multi-objective decision making problems, while also being simple to understand.…”
Section: Problem Settingmentioning
confidence: 99%
See 1 more Smart Citation
“…We formalise this as a multi-objective Markov decision process (MOMDP). We note that more complex models exist, such as a multi-objective partially observable Markov decision process [110,160,161,202] and multi-objective multi-agent systems [126]. However, the MOMDP formalisation allows us to study many relevant aspects of multi-objective decision making problems, while also being simple to understand.…”
Section: Problem Settingmentioning
confidence: 99%
“…They show that the non-linear nature of this utility prevents direct adaptation of methods like dynamic programming which are based on the Bellman equation, and instead develop a non-linear programming solution for this task. Meanwhile, Wray et al [203] identify Lexicographic MDPs as a specific subset of MOMDPs, where there is a specified ordering over objectives. They develop methods based on valueiteration for solving such tasks, allowing the ordering of objectives to be state-dependent and incorporating the concept of slack, which allows some degree of loss in the primary objective in order to obtain gains in secondary objectives.…”
Section: Multi-objective Planning Algorithmsmentioning
confidence: 99%
“…As with other forms of slack (Wray and Zilberstein 2015a), this algorithm only guarantees that the final joint policy π * has a value V π * 0 that is within δ of the more preferred objective's value V * 0 , which is approximate in this case with a fixed set of controller nodes. It is not within slack of the true optimal value V * 0 , since we obviously did not compute that; π * is an approximate solution after all.…”
Section: Scalable Solution For Ccpsmentioning
confidence: 99%
“…Our implementation uses Python 3.4.3 with scikit-learn 0.16.1, NumPy 1.9.2, and SciPy 0.15.1, run on an Intel(R) Core(TM) i7-4702HQ CPU at 2.20GHz, 8GB of RAM, and a Nvidia(R) GeForce GTX 870M. We leverage a high-performing GPU-based implementation of PBVI using CUDA(C) 6.5 (Wray and Zilberstein 2015a;2015b). We compare our algorithm with the three original decision-theoretic algorithms designed for a reluctant, fallible, and cost-varying oracles denoted as PAL #1, #2, and #3, respectively (Donmez and Carbonell 2008a).…”
Section: Experimentationmentioning
confidence: 99%