The impact of environmental stochasticity on value-based multiobjective reinforcement learning

Vamplew, Peter; Foale, Cameron; Dazeley, Richard

doi:10.1007/s00521-021-05859-1

Cited by 15 publications

(12 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In order to apply RL to the real world the MORL community must consider the ESR criterion. However, the ESR criterion has largely been ignored by the MORL community, with the exception of the works of Roijers et al [31,30], Hayes et al [17,16] and Vamplew et al [39]. The works of Hayes et al [16,17] and Roijers et al [30] present single-policy algorithms that are suitable to learn policies under the ESR criterion, however, prior to this work, a formal definition of the necessary requirements to satisfy the ESR criterion had not previously been defined.…”

Section: Discussionmentioning

confidence: 99%

“…The current MORL literature focuses only on methods which learn the optimal set of policies under the SER criterion [23,41]. As already highlighted, the ESR criterion has largely been ignored by the MORL community, with a few exceptions [30,17,39]. In Section 6 we address this research gap and we present a novel multi-objective tabular distributional reinforcement learning (MOTDRL) algorithm that learns an optimal set of policies for the ESR criterion, also known as the ESR set, for multi-objective multi-armed bandit (MOMAB) problems.…”

Section: Multi-objective Tabular Distributional Reinforcement Learningmentioning

confidence: 99%

See 1 more Smart Citation

Expected Scalarised Returns Dominance: A New Solution Concept for Multi-Objective Decision Making

Hayes,

Verstraeten,

Roijers

et al. 2021

Preprint

View full text Add to dashboard Cite

In many real-world scenarios, the utility of a user is derived from the single execution of a policy. In this case, to apply multi-objective reinforcement learning, the expected utility of the returns must be optimised. Various scenarios exist where a user's preferences over objectives (also known as the utility function) are unknown or difficult to specify. In such scenarios, a set of optimal policies must be learned. However, settings where the expected utility must be maximised have been largely overlooked by the multi-objective reinforcement learning community and, as a consequence, a set of optimal solutions has yet to be defined. In this paper we address this challenge by proposing first-order stochastic dominance as a criterion to build solution sets to maximise expected utility. We also propose a new dominance criterion, known as expected scalarised returns (ESR) dominance, that extends firstorder stochastic dominance to allow a set of optimal policies to be learned in practice. We then define a new solution concept called the ESR set, which is a set of policies that are ESR dominant. Finally, we define a new multi-objective distributional tabular reinforcement learning (MOT-DRL) algorithm to learn the ESR set in a multi-objective multi-armed bandit setting.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Multi-objective Tabular Distributional Reinforcement Learningmentioning

confidence: 99%

Expected Scalarised Returns Dominance: A New Solution Concept for Multi-Objective Decision Making

Hayes,

Verstraeten,

Roijers

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…In the seventh paper, The impact of environmental stochasticity on value-based multiobjective reinforcement learning [7], Vamplew et al analyse the role of stochastic rewards and stochastic state transitions in multi-objective Q-Learning. Most of the previous empirical evaluations of multi-objective reinforcement learning and scalarisation methods assume that environments are deterministic.…”

Section: Contents Of the Special Issuementioning

confidence: 99%

“…Most of the previous empirical evaluations of multi-objective reinforcement learning and scalarisation methods assume that environments are deterministic. In [7], the authors find that stochasticity in rewards/transitions affects the optimal solution agents which can learn and, importantly, introduce important differences based on the choice of optimisation criterion (e.g. expected scalarised returns or scalarised expected returns).…”

Section: Contents Of the Special Issuementioning

confidence: 99%

Special issue on adaptive and learning agents 2020

Silva

MacAlpine

Rădulescu

et al. 2022

Neural Comput & Applic

View full text Add to dashboard Cite

“…Nevertheless, this violates the assumption of additive returns in the Bellman equation at the heart of these algorithms [Roijers et al, 2013], and therefore it may be necessary to condition the Q-values and the agent's choice of action on an augmented state formed by concatenating the environmental state with the summed rewards previously received by the agent [Geibel, 2006]. Additionally these approaches may fail to converge to the optimal policy in environments with stochastic state transitions [Vamplew et al, 2021a].…”

Section: Single-policy Algorithmsmentioning

confidence: 99%

A Practical Guide to Multi-Objective Reinforcement Learning and Planning

Hayes,

Rădulescu,

Bargiacchi

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Real-world decision-making tasks are generally complex, requiring trade-offs between multiple, often conflicting, objectives. Despite this, the majority of research in reinforcement learning and decision-theoretic planning either assumes only a single objective, or that multiple objectives can be adequately handled via a simple linear combination. Such approaches may oversimplify the underlying problem and hence produce suboptimal results. This paper serves as a guide to the application of multi-objective methods to difficult problems, and is aimed at researchers who are already familiar with single-objective reinforcement learning and planning methods who wish to adopt a multi-objective perspective on their research, as well as practitioners who encounter multiobjective decision problems in practice. It identifies the factors that may influence the nature of the desired solution, and illustrates by example how these influence the design of multi-objective decision-making systems for complex problems.

show abstract

The impact of environmental stochasticity on value-based multiobjective reinforcement learning

Cited by 15 publications

References 30 publications

Expected Scalarised Returns Dominance: A New Solution Concept for Multi-Objective Decision Making

Expected Scalarised Returns Dominance: A New Solution Concept for Multi-Objective Decision Making

Special issue on adaptive and learning agents 2020

A Practical Guide to Multi-Objective Reinforcement Learning and Planning

Contact Info

Product

Resources

About