2019
DOI: 10.48550/arxiv.1910.05366
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Learning Nearly Decomposable Value Functions Via Communication Minimization

Abstract: Reinforcement learning encounters major challenges in multi-agent settings, such as scalability and non-stationarity. Recently, value function factorization learning emerges as a promising way to address these challenges in collaborative multi-agent systems. However, existing methods have been focusing on learning fully decentralized value function, which are not efficient for tasks requiring communication. To address this limitation, this paper presents a novel framework for learning nearly decomposable value… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
33
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
4

Relationship

2
6

Authors

Journals

citations
Cited by 12 publications
(33 citation statements)
references
References 10 publications
0
33
0
Order By: Relevance
“…where K is the number of attention heads, λ i,k (τ , a) and φ i,k (τ ) are attention weights activated by a sigmoid regularizer, and υ k (τ ) > 0 is a positive key of each head. This sigmoid activation of λ i brings sparsity to the credit assignment of the joint advantage function to individuals, which enables efficient multi-agent learning [19].…”
Section: Transformation Network Module Uses the Centralized Informati...mentioning
confidence: 99%
“…where K is the number of attention heads, λ i,k (τ , a) and φ i,k (τ ) are attention weights activated by a sigmoid regularizer, and υ k (τ ) > 0 is a positive key of each head. This sigmoid activation of λ i brings sparsity to the credit assignment of the joint advantage function to individuals, which enables efficient multi-agent learning [19].…”
Section: Transformation Network Module Uses the Centralized Informati...mentioning
confidence: 99%
“…The closest paper to our work is NDQ [44], which also utilizes latent variables to represent the information but as the communication messages during the decentralized agents execution. Although we both consider the information extraction as an information bottleneck problem, there are several key differences between our work and NDQ: (I) NDQ is a value-based method, while our work is a policy-based method under the soft-actor-critic framework.…”
Section: Related Workmentioning
confidence: 99%
“…The main shortcoming is that the computation of the mutual information is computationally challenging. Inspired by the recent advancement in Bayesian inference and variational auto-encoder [17,28,44], we propose a novel way of representing it by utilizing latent vectors from variational inference models using information theoretical regularization method, and then derive the evidence lower bound (ELBO) of its objective.…”
Section: A Mathematical Details A1 Boundaries For Extra-state Informa...mentioning
confidence: 99%
See 1 more Smart Citation
“…ROMA [30] constructs a stochastic role embedding space to lead agents to different policies based on different roles. NDQ [32] learns a message representation to achieve expressive and succinct communication. RODE [31] uses an action encoder to learn action representations and applies clustering methods to decompose joint action spaces into restricted role action spaces to reduce the policy search space.…”
Section: Related Workmentioning
confidence: 99%