2021
DOI: 10.48550/arxiv.2110.02793
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Multi-Agent Constrained Policy Optimisation

Abstract: Developing reinforcement learning algorithms that satisfy safety constraints is becoming increasingly important in real-world applications. In multi-agent reinforcement learning (MARL) settings, policy optimisation with safety awareness is particularly challenging because each individual agent has to not only meet its own safety constraints, but also consider those of others so that their joint behaviour can be guaranteed safe. Despite its importance, the problem of safe multiagent learning has not been rigoro… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
7
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 6 publications
(7 citation statements)
references
References 25 publications
0
7
0
Order By: Relevance
“…They have been applied in the distribution network or microgrids to deal with the large-scale integrated renewable energy resources and voltage violations [23], [24]. However, algorithms always use a single value function or share the same state value function in multiagent RL which are basically trained in a centralized way [25]. They may suffer from heavy communication burdens, which is prone to communication failures.…”
Section: Introductionmentioning
confidence: 99%
“…They have been applied in the distribution network or microgrids to deal with the large-scale integrated renewable energy resources and voltage violations [23], [24]. However, algorithms always use a single value function or share the same state value function in multiagent RL which are basically trained in a centralized way [25]. They may suffer from heavy communication burdens, which is prone to communication failures.…”
Section: Introductionmentioning
confidence: 99%
“…When solving Dec-POMDPs or potential games , the framework of centralised training and decentralised execution (CTDE) is often employed Rashid et al, 2018;Wen et al, 2018;Hu et al, 2021) where a centralised critic is trained to gather all agents' local observations and assign credits. While CTDE methods rely on the individual-global-max assumption (Son et al, 2019), another thread of work is built on the so-called advantage decomposition lemma (Kuba et al, 2021b), which holds in general for any co-operative games; such a lemma leads to provably convergent multi-agent trust-region methods (Kuba et al, 2021a) and constrained policy optimisation methods (Gu et al, 2021).…”
Section: Related Workmentioning
confidence: 99%
“…The partial work of the study is shared on an open platform [21]. In this study, we provide a more comprehensive investigation and solution for safe MARL.…”
Section: Introductionmentioning
confidence: 99%