2019
DOI: 10.48550/arxiv.1909.03510
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Bi-level Actor-Critic for Multi-agent Coordination

Abstract: Coordination is one of the essential problems in multiagent systems. Typically multi-agent reinforcement learning (MARL) methods treat agents equally and the goal is to solve the Markov game to an arbitrary Nash equilibrium (NE) when multiple equilibra exist, thus lacking a solution for NE selection. In this paper, we treat agents unequally and consider Stackelberg equilibrium as a potentially better convergence point than Nash equilibrium in terms of Pareto superiority, especially in cooperative environments.… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
6
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
4

Relationship

2
2

Authors

Journals

citations
Cited by 4 publications
(6 citation statements)
references
References 6 publications
0
6
0
Order By: Relevance
“…Actorcritic (AC) methods are a long-established class of techniques, consisting of two subproblems that are intertwined with each other. At the same time, AC methods are widely studied in recent years [147], [148], [146], [21], [69], [65], [23]. In the case of guaranteed convergence, [21] proposes a bi-level AC method to solve multi-agent reinforcement learning problem in finding Stackelberg equilibrium under Markov games.…”
Section: Deep Reinforcement Learningmentioning
confidence: 99%
See 3 more Smart Citations
“…Actorcritic (AC) methods are a long-established class of techniques, consisting of two subproblems that are intertwined with each other. At the same time, AC methods are widely studied in recent years [147], [148], [146], [21], [69], [65], [23]. In the case of guaranteed convergence, [21] proposes a bi-level AC method to solve multi-agent reinforcement learning problem in finding Stackelberg equilibrium under Markov games.…”
Section: Deep Reinforcement Learningmentioning
confidence: 99%
“…At the same time, AC methods are widely studied in recent years [147], [148], [146], [21], [69], [65], [23]. In the case of guaranteed convergence, [21] proposes a bi-level AC method to solve multi-agent reinforcement learning problem in finding Stackelberg equilibrium under Markov games. It allows agents to have different knowledge base (thus intelligent), while their actions still can be executed simultaneously and distributedly.…”
Section: Deep Reinforcement Learningmentioning
confidence: 99%
See 2 more Smart Citations
“…The bilevel constrained MDP is a special form of bilevel reinforcement learning [19], since 1) both their objectives are the summation of the discounted rewards in sequential states, 2) the analytic forms of objectives are unknown and they can only be learned through interactions with the environment in a model-free way. Their difference lies within the constraints which includes both states and policy in our model.…”
Section: Mathematical Modelmentioning
confidence: 99%