2020
DOI: 10.48550/arxiv.2003.10598
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Multi-Agent Reinforcement Learning for Problems with Combined Individual and Team Reward

Abstract: Many cooperative multi-agent problems require agents to learn individual tasks while contributing to the collective success of the group. This is a challenging task for current state-of-the-art multi-agent reinforcement algorithms that are designed to either maximize the global reward of the team or the individual local rewards. The problem is exacerbated when either of the rewards is sparse leading to unstable learning. To address this problem, we present Decomposed Multi-Agent Deep Deterministic Policy Gradi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 14 publications
0
2
0
Order By: Relevance
“…We then designed the execution and training steps of the algorithm. Following the previous methods of resource allocation in vehicular networks [40][41][42], we set a data packet transmission task as an episode and set the maximum transmission time duration as the maximum time step for each episode. It is worth mentioning that, for the sake of easier comparison with previous methods, we adopted the same execution and training framework and a similar MDP transition process.…”
Section: Learning Algorithm and Training Setupmentioning
confidence: 99%
“…We then designed the execution and training steps of the algorithm. Following the previous methods of resource allocation in vehicular networks [40][41][42], we set a data packet transmission task as an episode and set the maximum transmission time duration as the maximum time step for each episode. It is worth mentioning that, for the sake of easier comparison with previous methods, we adopted the same execution and training framework and a similar MDP transition process.…”
Section: Learning Algorithm and Training Setupmentioning
confidence: 99%
“…Another solution is to use the RNN instead of feed forward neural network and consider the information of the agent's state and action histories h t for action selection [42]. Since the state of the environment of our proposed model is partially observable and there are two different objectives for the agents as individual and common objectives, we utilize the MASRDDPG which is a derivative of the decomposed multi agent deep deterministic policy gradient (DE-MADDPG) proposed in [43]. For further analyzes, we consider the RDPG method as a second solution since it is suitable for partially observable and uncertain environments [42].…”
Section: Solution and Algorithmmentioning
confidence: 99%