Arena: A General Evaluation Platform and Building Toolkit for Multi-Agent Intelligence

Song, Yuhang; Wojcicki, Andrzej; Lukasiewicz, Thomas; Wang, Jianyi; Aryan, Abi; Xu, Zhenghua; Xu, Mai; Ding, Zihan; Wu, Lianlong

doi:10.1609/aaai.v34i05.6216

Cited by 14 publications

(6 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The objective for each agent is to maximize the expected cumulative reward received during the game. For a cooperative POSG, we quote the definition in Song et al (2020),…”

Section: Preliminariesmentioning

confidence: 99%

Dealing with Non-Stationarity in MARL via Trust-Region Decomposition

Li¹,

Wang²,

Jin³

et al. 2021

Preprint

View full text Add to dashboard Cite

Non-stationarity is one thorny issue in multi-agent reinforcement learning, which is caused by the policy changes of agents during the learning procedure. Current works to solve this problem have their own limitations in effectiveness and scalability, such as centralized critic and decentralized actor (CCDA), population-based self-play, modeling of others and etc. In this paper, we novelly introduce a δ-stationarity measurement to explicitly model the stationarity of a policy sequence, which is theoretically proved to be proportional to the joint policy divergence. However, simple policy factorization like mean-field approximation will mislead to larger policy divergence, which can be considered as trust region decomposition dilemma. We model the joint policy as a general Markov random field and propose a trust region decomposition network based on message passing to estimate the joint policy divergence more accurately. The Multi-Agent Mirror descent policy algorithm with Trust region decomposition, called MAMT, is established with the purpose to satisfy δ-stationarity. MAMT can adjust the trust region of the local policies adaptively in an end-to-end manner, thereby approximately constraining the divergence of joint policy to alleviate the non-stationary problem. Our method can bring noticeable and stable performance improvement compared with baselines in coordination tasks of different complexity.

show abstract

“…The objective for each agent is to maximize the expected cumulative reward received during the game. For a cooperative POSG, we quote the definition in Song et al (2020),…”

Section: Preliminariesmentioning

confidence: 99%

Dealing with Non-Stationarity in MARL via Trust-Region Decomposition

Li¹,

Wang²,

Jin³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Several papers Song et al (2020b); Lowe et al (2017) have introduced MARL benchmarks for specific domains, but do not measure generalisation and don't use learning agents to produce evaluation tasks. Another approach with a long history in game theory involves organizing a competition between strategies submitted by different research groups (e.g.…”

Section: Related Workmentioning

confidence: 99%

Scalable Evaluation of Multi-Agent Reinforcement Learning with Melting Pot

Leibo,

Duéñez-Guzmán,

Vezhnevets

et al. 2021

Preprint

View full text Add to dashboard Cite

Existing evaluation suites for multi-agent reinforcement learning (MARL) do not assess generalization to novel situations as their primary objective (unlike supervised-learning benchmarks). Our contribution, Melting Pot, is a MARL evaluation suite that fills this gap, and uses reinforcement learning to reduce the human labor required to create novel test scenarios. This works because one agent's behavior constitutes (part of) another agent's environment. To demonstrate scalability, we have created over 80 unique test scenarios covering a broad range of research topics such as social dilemmas, reciprocity, resource sharing, and task partitioning. We apply these test scenarios to standard MARL training algorithms, and demonstrate how Melting Pot reveals weaknesses not apparent from training performance alone.* Equal contribution 1 DeepMind 2 Google Brain.

show abstract

“…The objective for each agent is to maximize the expected cumulative reward received during the game. For a cooperative POSG, we quote the definition in Song et al [42],…”

Section: Preliminariesmentioning

confidence: 99%

Structured Diversification Emergence via Reinforced Organization Control and Hierarchical Consensus Learning

Li,

Wang,

Jin

et al. 2021

Preprint

View full text Add to dashboard Cite

When solving a complex task, humans will spontaneously form teams and to complete different parts of the whole task, respectively. Meanwhile, the cooperation between teammates will improve efficiency. However, for current cooperative MARL methods, the cooperation team is constructed through either heuristics or end-to-end blackbox optimization. In order to improve the efficiency of cooperation and exploration, we propose a structured diversification emergence MARL framework named Rochico based on reinforced organization control and hierarchical consensus learning. Rochico first learns an adaptive grouping policy through the organization control module, which is established by independent multi-agent reinforcement learning. Further, the hierarchical consensus module based on the hierarchical intentions with consensus constraint is introduced after team formation. Simultaneously, utilizing the hierarchical consensus module and a self-supervised intrinsic reward enhanced decision module, the proposed cooperative MARL algorithm Rochico can output the final diversified multi-agent cooperative policy. All three modules are organically combined to promote the structured diversification emergence. Comparative experiments on four large-scale cooperation tasks show that Rochico is significantly better than the current SOTA algorithms in terms of exploration efficiency and cooperation strength.

show abstract

Arena: A General Evaluation Platform and Building Toolkit for Multi-Agent Intelligence

Cited by 14 publications

References 18 publications

Dealing with Non-Stationarity in MARL via Trust-Region Decomposition

Dealing with Non-Stationarity in MARL via Trust-Region Decomposition

Scalable Evaluation of Multi-Agent Reinforcement Learning with Melting Pot

Structured Diversification Emergence via Reinforced Organization Control and Hierarchical Consensus Learning

Contact Info

Product

Resources

About