2020 International Joint Conference on Neural Networks (IJCNN) 2020
DOI: 10.1109/ijcnn48605.2020.9206879
|View full text |Cite
|
Sign up to set email alerts
|

Multi-Agent Reinforcement Learning for Problems with Combined Individual and Team Reward

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 30 publications
(15 citation statements)
references
References 8 publications
0
15
0
Order By: Relevance
“…The DE-MADDPG algorithm is an extended version of the MADDPG algorithm, which improves the network architecture of the MADDPG algorithm [33]. The MADDPG algorithm implements centralized training through a global centralized critic network.…”
Section: De-maddpg Approachmentioning
confidence: 99%
See 1 more Smart Citation
“…The DE-MADDPG algorithm is an extended version of the MADDPG algorithm, which improves the network architecture of the MADDPG algorithm [33]. The MADDPG algorithm implements centralized training through a global centralized critic network.…”
Section: De-maddpg Approachmentioning
confidence: 99%
“…where D is the experience replay buffer, Q ϕ i is the local critic network of the agent i with the parameter ϕ i , and Q ψ is the global critic network with the parameter ψ [33].…”
Section: De-maddpg Approachmentioning
confidence: 99%
“…RL techniques have been well studied so far and applying those techniques for multi-agent system is a recent open discussion. The objective of state-of-the-art MARL algorithm can be categorized into two [11]. One is to maximize the global reward for the success as a team as it can be found in COMA [10].…”
Section: A Concept Of Rl and Marlmentioning
confidence: 99%
“…In addition, "target policy smoothing" was introduced which adds clipped Gaussian noise to the selected action to avoid overfitting to the narrow peaks in the value estimation due to a concern with deterministic policies. Recently there are many cases which TD3 have been applied for multi-agent system as multi-agent TD3 (MATD3) [18], [11]. [18] has a structure of decentralized actor-critic which is similar to [13] but instead of DDPG, this has TD3 network.…”
Section: B Policy Gradient Algorithmsmentioning
confidence: 99%
See 1 more Smart Citation