2013
DOI: 10.1609/aaai.v27i1.8580
|View full text |Cite
|
Sign up to set email alerts
|

Multiagent Learning with a Noisy Global Reward Signal

Abstract: Scaling multiagent reinforcement learning to domains with many agents is a complex problem. In particular, multiagent credit assignment becomes a key issue as the system size increases. Some multiagent systems suffer from a global reward signal that is very noisy or difficult to analyze. This makes deriving a learnable local reward signal very difficult. Difference rewards (a particular instance of reward shaping) have been used to alleviate this concern, but they remain difficult to compute in many domains. I… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 18 publications
0
4
0
Order By: Relevance
“…We use following baselines. Two domain specific approaches-AT-BASELINE (Brittain and Wei 2019) and AT-DR (Proper and Tumer 2013). LOCAL-DR (Colby et al 2016) is another DR based baseline.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…We use following baselines. Two domain specific approaches-AT-BASELINE (Brittain and Wei 2019) and AT-DR (Proper and Tumer 2013). LOCAL-DR (Colby et al 2016) is another DR based baseline.…”
Section: Methodsmentioning
confidence: 99%
“…We now show how to compute difference rewards (DRs) from r w in a model-free setting without requiring any extra simulations or domain expertise. Unlike previous approaches (Proper and Tumer 2013), our methods is not tied to a domain, and also explicitly utilizes aggregate information over all the agents, in contrast to methods that use only local information available to an agent (Colby et al 2016). As a result, our method results in much better solution quality when combined with a policy optimization technique than such previous approaches.…”
Section: Computing Difference Rewardsmentioning
confidence: 99%
See 1 more Smart Citation
“…The closest works addressing team rewards in cooperative settings that we could find include works on difference rewards which try to measure the impact of an individual agent's actions on the full system reward [12]. The high learnability, among other nice properties, makes difference rewards attractive but impractical, due to the required knowledge of the total system state [13][14][15].…”
Section: Introductionmentioning
confidence: 99%