Multi-Objective MDPs with Conditional Lexicographic Reward Preferences

Wray, Kyle Hollins; Zilberstein, Shlomo; Mouaddib, Abdel‐Illah

doi:10.1609/aaai.v29i1.9647

Cited by 33 publications

(25 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We formalise this as a multi-objective Markov decision process (MOMDP). We note that more complex models exist, such as a multi-objective partially observable Markov decision process [110,160,161,202] and multi-objective multi-agent systems [126]. However, the MOMDP formalisation allows us to study many relevant aspects of multi-objective decision making problems, while also being simple to understand.…”

Section: Problem Settingmentioning

confidence: 99%

“…They show that the non-linear nature of this utility prevents direct adaptation of methods like dynamic programming which are based on the Bellman equation, and instead develop a non-linear programming solution for this task. Meanwhile, Wray et al [203] identify Lexicographic MDPs as a specific subset of MOMDPs, where there is a specified ordering over objectives. They develop methods based on valueiteration for solving such tasks, allowing the ordering of objectives to be state-dependent and incorporating the concept of slack, which allows some degree of loss in the primary objective in order to obtain gains in secondary objectives.…”

Section: Multi-objective Planning Algorithmsmentioning

confidence: 99%

See 1 more Smart Citation

A practical guide to multi-objective reinforcement learning and planning

Hayes

Rădulescu

Bargiacchi

et al. 2022

Auton Agent Multi-Agent Syst

148

View full text Add to dashboard Cite

Real-world sequential decision-making tasks are generally complex, requiring trade-offs between multiple, often conflicting, objectives. Despite this, the majority of research in reinforcement learning and decision-theoretic planning either assumes only a single objective, or that multiple objectives can be adequately handled via a simple linear combination. Such approaches may oversimplify the underlying problem and hence produce suboptimal results. This paper serves as a guide to the application of multi-objective methods to difficult problems, and is aimed at researchers who are already familiar with single-objective reinforcement learning and planning methods who wish to adopt a multi-objective perspective on their research, as well as practitioners who encounter multi-objective decision problems in practice. It identifies the factors that may influence the nature of the desired solution, and illustrates by example how these influence the design of multi-objective decision-making systems for complex problems.

show abstract

Section: Problem Settingmentioning

confidence: 99%

Section: Multi-objective Planning Algorithmsmentioning

confidence: 99%

A practical guide to multi-objective reinforcement learning and planning

Hayes

Rădulescu

Bargiacchi

et al. 2022

Auton Agent Multi-Agent Syst

148

View full text Add to dashboard Cite

show abstract

“…As with other forms of slack (Wray and Zilberstein 2015a), this algorithm only guarantees that the final joint policy π * has a value V π * 0 that is within δ of the more preferred objective's value V * 0 , which is approximate in this case with a fixed set of controller nodes. It is not within slack of the true optimal value V * 0 , since we obviously did not compute that; π * is an approximate solution after all.…”

Section: Scalable Solution For Ccpsmentioning

confidence: 99%

Integrated Cooperation and Competition in Multi-Agent Decision-Making

Wray

Kumar

Zilberstein

2018

AAAI

Self Cite

View full text Add to dashboard Cite

Observing that many real-world sequential decision problems are not purely cooperative or purely competitive, we propose a new model—cooperative-competitive process (CCP)—that can simultaneously encapsulate both cooperation and competition. First, we discuss how the CCP model bridges the gap between cooperative and competitive models. Next, we investigate a specific class of group-dominant CCPs, in which agents cooperate to achieve a common goal as their primary objective, while also pursuing individual goals as a secondary objective. We provide an approximate solution for this class of problems that leverages stochastic finite-state controllers. The model is grounded in two multi-robot meeting and box-pushing domains that are implemented in simulation and demonstrated on two real robots.

show abstract

“…Our implementation uses Python 3.4.3 with scikit-learn 0.16.1, NumPy 1.9.2, and SciPy 0.15.1, run on an Intel(R) Core(TM) i7-4702HQ CPU at 2.20GHz, 8GB of RAM, and a Nvidia(R) GeForce GTX 870M. We leverage a high-performing GPU-based implementation of PBVI using CUDA(C) 6.5 (Wray and Zilberstein 2015a;2015b). We compare our algorithm with the three original decision-theoretic algorithms designed for a reluctant, fallible, and cost-varying oracles denoted as PAL #1, #2, and #3, respectively (Donmez and Carbonell 2008a).…”

Section: Experimentationmentioning

confidence: 99%

A POMDP Formulation of Proactive Learning

Wray

Zilberstein

2016

AAAI

Self Cite

View full text Add to dashboard Cite

We cast the Proactive Learning (PAL) problem—Active Learning (AL) with multiple reluctant, fallible, cost-varying oracles—as a Partially Observable Markov Decision Process (POMDP). The agent selects an oracle at each time step to label a data point, while it maintains a belief over the true underlying correctness of its current dataset’s labels. The goal is to minimize labeling costs while considering the value of obtaining correct labels, thus maximizing final resultant classifier accuracy. We prove three properties that show our particular formulation leads to a structured and bounded-size set of belief points, enabling strong performance of point-based methods to solve the POMDP. Our method is compared with the original three algorithms proposed by Donmez and Carbonell and a simple baseline. We demonstrate that our approach matches or improves upon the original approach within five different oracle scenarios, each on two datasets. Finally, our algorithm provides a general, well-defined mathematical foundation to build upon.

show abstract

Multi-Objective MDPs with Conditional Lexicographic Reward Preferences

Cited by 33 publications

References 18 publications

A practical guide to multi-objective reinforcement learning and planning

A practical guide to multi-objective reinforcement learning and planning

Integrated Cooperation and Competition in Multi-Agent Decision-Making

A POMDP Formulation of Proactive Learning

Contact Info

Product

Resources

About