2008
DOI: 10.1007/s10458-008-9046-9
|View full text |Cite
|
Sign up to set email alerts
|

Analyzing and visualizing multiagent rewards in dynamic and stochastic domains

Abstract: The ability to analyze the effectiveness of agent reward structures is critical to the successful design of multiagent learning algorithms. Though final system performance is the best indicator of the suitability of a given reward structure, it is often preferable to analyze the reward properties that lead to good system behavior (i.e., properties promoting coordination among the agents and providing agents with strong signal to noise ratios). This step is particularly helpful in continuous, dynamic, stochasti… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
75
0
4

Year Published

2009
2009
2021
2021

Publication Types

Select...
5
2
1

Relationship

4
4

Authors

Journals

citations
Cited by 86 publications
(79 citation statements)
references
References 21 publications
(37 reference statements)
0
75
0
4
Order By: Relevance
“…Thus, an agent acting to increase the Difference Reward will also act to increase the global reward. This property is termed factoredness [5]. Further, because the Difference Reward only depends on the actions of agent i, noise from other agents is reduced in the feedback given by D i .…”
Section: Difference Rewardmentioning
confidence: 99%
“…Thus, an agent acting to increase the Difference Reward will also act to increase the global reward. This property is termed factoredness [5]. Further, because the Difference Reward only depends on the actions of agent i, noise from other agents is reduced in the feedback given by D i .…”
Section: Difference Rewardmentioning
confidence: 99%
“…The weights of the neural network are adjusted through an evolutionary search algorithm [3,2] for ranking and subsequently locating successful networks within a population [12,3]. The algorithm maintains a population of ten networks, utilizes mutation to modify individuals, and ranks them based on a performance metric specific to the domain.…”
Section: Robot Capabilitiesmentioning
confidence: 99%
“…• The difference evaluation reflects the impact a robot has on the full system [3,2]. By removing the value of the system evaluation where robot i is inactive, the difference evaluation computes the value added by the observations of robot i alone.…”
Section: Robot Objectivesmentioning
confidence: 99%
“…As a consequence, in this work, we use the difference reward as a starting point for the reward an agent receives after each step. Earlier work has shown that the difference reward significantly outperforms both agents receiving a purely local reward and all agents receiving the same system reward [3,2,33,32,36]. The difference reward is given by:…”
Section: Basic Agent Learningmentioning
confidence: 99%
“…In these cases, the learning needs of the agents are modified to account for their presence in a larger system [2,11,13,22,35,37]. However, though these methods have yielded tremendous advances in multiagent learning, they are principally based on an agent trying an action, receiving an evaluation of that action, and updating its own estimate on the "value" of taking that action in that state.…”
Section: Introductionmentioning
confidence: 99%