Multi-agent Multi-objective Learning Using Heuristically Accelerated Reinforcement Learning

Ferreira, Leonardo Anjoletto; Bianchi, Reinaldo A. C.; Ribeiro, Carlos H. C.

doi:10.1109/sbr-lars.2012.10

Cited by 6 publications

(3 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, even though no specialized methods are needed to address this setting, it is nonetheless the most commonly studied setting for MORL. Linear scalarization with uniform weights, i.e., all the elements of w are equal, forms the basis of the work of Karlsson (1997), Ferreira, Bianchi, and Ribeiro (2012), Aissani, Beldjilali, and Trentesaux (2008) and Shabani (2009) amongst others, while non-uniform weights have been used by authors such as Castelletti et al (2002), Guo et al (2009) andPerez et al (2009). The majority of this work uses TD methods, which work on-line, although Castelletti et al (2010) extend off-line Fitted Q-Iteration (Ernst, Geurts, & Wehenkel, 2005) to multiple objectives.…”

Section: Single-policy Learning Methodsmentioning

confidence: 99%

A Survey of Multi-Objective Sequential Decision-Making

Roijers¹,

Vamplew²,

Whiteson³

et al. 2013

jair

374

395

View full text Add to dashboard Cite

Sequential decision-making problems with multiple objectives arise naturally in practice and pose unique challenges for research in decision-theoretic planning and learning, which has largely focused on single-objective settings. This article surveys algorithms designed for sequential decision-making problems with multiple objectives. Though there is a growing body of literature on this subject, little of it makes explicit under what circumstances special methods are needed to solve multi-objective problems. Therefore, we identify three distinct scenarios in which converting such a problem to a single-objective one is impossible, infeasible, or undesirable. Furthermore, we propose a taxonomy that classifies multi-objective methods according to the applicable scenario, the nature of the scalarization function (which projects multi-objective values to scalar ones), and the type of policies considered. We show how these factors determine the nature of an optimal solution, which can be a single policy, a convex hull, or a Pareto front. Using this taxonomy, we survey the literature on multi-objective methods for planning and learning. Finally, we discuss key applications of such methods and outline opportunities for future work

show abstract

Section: Single-policy Learning Methodsmentioning

confidence: 99%

A Survey of Multi-Objective Sequential Decision-Making

Roijers¹,

Vamplew²,

Whiteson³

et al. 2013

jair

374

395

View full text Add to dashboard Cite

show abstract

“…Otherwise an action is selected randomly. This has been the predominant exploration approach adopted in the MORL literature so far [12,15,16,19,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45].…”

Section: Exploration In Multiobjective Rlmentioning

confidence: 99%

Softmax exploration strategies for multiobjective reinforcement learning

2017

View full text Add to dashboard Cite

Despite growing interest over recent years in applying reinforcement learning to multiobjective problems, there has been little research into the applicability and effectiveness of exploration strategies within the multiobjective context. This work considers several widely-used approaches to exploration from the single-objective reinforcement learning literature, and examines their incorporation into multiobjective Q-learning. In particular this paper proposes two novel approaches which extend the softmax operator to work with vectorvalued rewards. The performance of these exploration strategies is evaluated across a set of benchmark environments. Issues arising from the multiobjective formulation of these benchmarks which impact on the performance of the exploration strategies are identified. It is shown that of the techniques considered, the combination of the novel softmax-epsilon exploration with optimistic initialisation provides the most effective trade-off between exploration and exploitation.

show abstract

“…An advantage of this approach is that the base policies can be found using simple methods. For example, linearly scalarised temporal difference learning has been widely used to find LDS policies for MORL tasks [20,21,22]. Linear scalarisation takes a weighted sum of the rewards, converting the problem to a single-objective MDP so standard TD-based methods can be used [5].…”

Section: Learning Stochastic or Non-stationary Multiobjective Policiesmentioning

confidence: 99%

Steering approaches to Pareto-optimal multiobjective reinforcement learning

et al. 2017

View full text Add to dashboard Cite

For reinforcement learning tasks with multiple objectives, it may be advantageous to learn stochastic or non-stationary policies. This paper investigates two novel algorithms for learning non-stationary policies which produce Pareto-optimal behaviour (w-steering and Q-steering), by extending prior work based on the concept of geometric steering. Empirical results demonstrate that both new algorithms offer substantial performance improvements over stationary deterministic policies, while Q-steering significantly outperforms w-steering when the agent has no information about recurrent states within the environment. It is further demonstrated that Q-steering can be used interactively by providing a human decision-maker with a visualisation of the Pareto front and allowing them to adjust the agent's target point during learning. To demonstrate broader applicability, the use of Q-steering in combination with function approximation is also illustrated on a task involving control of local battery storage for a residential solar power system.

show abstract

Multi-agent Multi-objective Learning Using Heuristically Accelerated Reinforcement Learning

Cited by 6 publications

References 18 publications

A Survey of Multi-Objective Sequential Decision-Making

A Survey of Multi-Objective Sequential Decision-Making

Softmax exploration strategies for multiobjective reinforcement learning

Steering approaches to Pareto-optimal multiobjective reinforcement learning

Contact Info

Product

Resources

About