2016
DOI: 10.1007/s00500-016-2124-z
|View full text |Cite
|
Sign up to set email alerts
|

Multi-objective multiagent credit assignment in reinforcement learning and NSGA-II

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
23
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
6

Relationship

0
6

Authors

Journals

citations
Cited by 15 publications
(23 citation statements)
references
References 48 publications
0
23
0
Order By: Relevance
“…Some recent works have sought to derive coverage sets in MOMMDPs using reinforcement learning or evolutionary algorithms (e.g. [129,128,63,60,59,58]). As in single-objective MMDPs, learning joint policies which coordinate agents' actions to get the desired outcome(s) in MOMMDPs is a difficult problem.…”
Section: Team Reward -Team Utility (Trtu)mentioning
confidence: 99%
“…Some recent works have sought to derive coverage sets in MOMMDPs using reinforcement learning or evolutionary algorithms (e.g. [129,128,63,60,59,58]). As in single-objective MMDPs, learning joint policies which coordinate agents' actions to get the desired outcome(s) in MOMMDPs is a difficult problem.…”
Section: Team Reward -Team Utility (Trtu)mentioning
confidence: 99%
“…Linear scalarization is commonly used in MORL literature (see e.g. Vamplew et al ., 2010; Roijers et al ., 2013; Van Moffaert et al ., 2013; Brys et al ., 2014; Mason et al ., 2016; Mannion et al ., 2016c, 2016d; Yliniemi & Tumer, 2016), and is defined in Equation (9) below: where w is the objective weight vector, w c is the weight for objective c , r + is the scalarized reward signal, r c is the component of the reward vector r for objective c , and C is the set of objectives. When using linear scalarization, altering the weights in the weight vector allows the user to express the relative importance of the objectives.…”
Section: Background and Related Workmentioning
confidence: 99%
“…In order to produce a set of Pareto optimal solutions using scalarized single-policy RL algorithms, researchers typically record the best non-dominated solutions found during a number of independent runs (see e.g. Vamplew et al ., 2010; Van Moffaert et al ., 2013; Yliniemi & Tumer, 2016; Mannion et al ., 2017). These solutions are then compared with one another to produce an approximation of the Pareto front.…”
Section: Background and Related Workmentioning
confidence: 99%
See 2 more Smart Citations