2020
DOI: 10.1109/access.2020.3011670
|View full text |Cite
|
Sign up to set email alerts
|

A Novel Multi-Agent Parallel-Critic Network Architecture for Cooperative-Competitive Reinforcement Learning

Abstract: multi-agent deep reinforcement learning (MDRL) is an emerging research hotspot and application direction in the field of machine learning and artificial intelligence. MDRL covers many algorithms, rules and frameworks, it is currently researched in swarm system, energy allocation optimization, stocking analysis, sequential social dilemma, and with extremely bright future. In this paper, a parallel-critic method based on classic MDRL algorithm MADDPG is proposed to alleviate the training instability problem in c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
9
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 13 publications
(9 citation statements)
references
References 19 publications
(18 reference statements)
0
9
0
Order By: Relevance
“…Te distance between the agent and the corresponding target point is inversely proportional to the obtained reward. As shown in Table 2, a comparison is made between the proposed approach and existing advanced learning methods [19,30,31,33] across six aspects: initial reward, maximum reward, stable reward, growth reward, training time (episode), and standardized variance.…”
Section: Methodsmentioning
confidence: 99%
See 3 more Smart Citations
“…Te distance between the agent and the corresponding target point is inversely proportional to the obtained reward. As shown in Table 2, a comparison is made between the proposed approach and existing advanced learning methods [19,30,31,33] across six aspects: initial reward, maximum reward, stable reward, growth reward, training time (episode), and standardized variance.…”
Section: Methodsmentioning
confidence: 99%
“…Te second is to change the evaluation strategy of the critic network, for higher response speed together with less calculation errors [15,28,29]. With similar focus, the methods proposed in [19,30,31] take into account the characteristics of these two types of methods and presented excellent performances: the growth reward, training speed, and stability of these methods are signifcantly better than other methods. In reference [19], a strategy smoothing technique is introduced into the MADDPG method to reduce the variance of learning strategies so as to alleviate the training instability of cooperative and competitive multiagent and signifcantly improve the stability and performance of training.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…As pointed out by Ackermann et al (2019) , the overestimation bias is also present in MARL. Some initial works have proposed to bridge concepts from the single-agent domain ( van Hasselt, 2010 ) to MARL ( Sun et al, 2020 ). Thus, SAC has been adjusted to the multi-agent domain by Wei et al (2018) , for which further extensions have been outlined, e.g., Zhang et al (2020) proposed a Lyapunov-based penalty term to the policy update to stabilize the policy gradient.…”
Section: Related Workmentioning
confidence: 99%