2012 Brazilian Robotics Symposium and Latin American Robotics Symposium 2012
DOI: 10.1109/sbr-lars.2012.10
|View full text |Cite
|
Sign up to set email alerts
|

Multi-agent Multi-objective Learning Using Heuristically Accelerated Reinforcement Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2013
2013
2019
2019

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(3 citation statements)
references
References 18 publications
0
3
0
Order By: Relevance
“…However, even though no specialized methods are needed to address this setting, it is nonetheless the most commonly studied setting for MORL. Linear scalarization with uniform weights, i.e., all the elements of w are equal, forms the basis of the work of Karlsson (1997), Ferreira, Bianchi, and Ribeiro (2012), Aissani, Beldjilali, and Trentesaux (2008) and Shabani (2009) amongst others, while non-uniform weights have been used by authors such as Castelletti et al (2002), Guo et al (2009) andPerez et al (2009). The majority of this work uses TD methods, which work on-line, although Castelletti et al (2010) extend off-line Fitted Q-Iteration (Ernst, Geurts, & Wehenkel, 2005) to multiple objectives.…”
Section: Single-policy Learning Methodsmentioning
confidence: 99%
“…However, even though no specialized methods are needed to address this setting, it is nonetheless the most commonly studied setting for MORL. Linear scalarization with uniform weights, i.e., all the elements of w are equal, forms the basis of the work of Karlsson (1997), Ferreira, Bianchi, and Ribeiro (2012), Aissani, Beldjilali, and Trentesaux (2008) and Shabani (2009) amongst others, while non-uniform weights have been used by authors such as Castelletti et al (2002), Guo et al (2009) andPerez et al (2009). The majority of this work uses TD methods, which work on-line, although Castelletti et al (2010) extend off-line Fitted Q-Iteration (Ernst, Geurts, & Wehenkel, 2005) to multiple objectives.…”
Section: Single-policy Learning Methodsmentioning
confidence: 99%
“…Otherwise an action is selected randomly. This has been the predominant exploration approach adopted in the MORL literature so far [12,15,16,19,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45].…”
Section: Exploration In Multiobjective Rlmentioning
confidence: 99%
“…An advantage of this approach is that the base policies can be found using simple methods. For example, linearly scalarised temporal difference learning has been widely used to find LDS policies for MORL tasks [20,21,22]. Linear scalarisation takes a weighted sum of the rewards, converting the problem to a single-objective MDP so standard TD-based methods can be used [5].…”
Section: Learning Stochastic or Non-stationary Multiobjective Policiesmentioning
confidence: 99%