2019
DOI: 10.1145/3366701
|View full text |Cite
|
Sign up to set email alerts
|

Social Learning in Multi Agent Multi Armed Bandits

Abstract: Motivated by emerging need of learning algorithms for large scale networked and decentralized systems, we introduce a distributed version of the classical stochastic Multi-Arm Bandit (MAB) problem. Our setting consists of a large number of agents n that collaboratively and simultaneously solve the same instance of K armed MAB to minimize the average cumulative regret over all agents. The agents can communicate and collaborate among each other only through a pairwise asynchronous gossip based protocol that exch… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
7
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
4

Relationship

2
7

Authors

Journals

citations
Cited by 34 publications
(7 citation statements)
references
References 47 publications
0
7
0
Order By: Relevance
“…Here, we define communication complexity as the total number of message exchanges, with the message consisting of arm index, observed reward, and possibly other information. In homogeneous settings, where all agents are identical in that K 𝑣 = K for all 𝑣 ∈ V, arm elimination-type algorithms [61] as well as gossip-type protocols [16,51] have been shown to be communication efficient and effective in terms of group regret.…”
Section: Goalsmentioning
confidence: 99%
See 1 more Smart Citation
“…Here, we define communication complexity as the total number of message exchanges, with the message consisting of arm index, observed reward, and possibly other information. In homogeneous settings, where all agents are identical in that K 𝑣 = K for all 𝑣 ∈ V, arm elimination-type algorithms [61] as well as gossip-type protocols [16,51] have been shown to be communication efficient and effective in terms of group regret.…”
Section: Goalsmentioning
confidence: 99%
“…A more realistic scenario in such systems is that agents form the nodes of an underlying communication network, where adjacent nodes can exchange messages but cannot reach more distant nodes in just one hop. This scenario has been investigated under various conditions, using simple communication protocols such as flooding ("message-passing") algorithms and gossiping to disseminate information [11,16,41,42,51,54,57].…”
Section: Introductionmentioning
confidence: 99%
“…The work in [251] established the first logarithmic upper bound on the number of communication rounds needed for an optimal regret bound. The authors considered a complete graph network topology, wherein a set of agents are initialized with a disjoint set of arms.…”
Section: Via2 Distributed Bandits Formulationsmentioning
confidence: 99%
“…The work in [142] established the first logarithmic upper bound on the number of communication rounds needed for an optimal regret bound. The authors considered a complete graph network topology, wherein a set of agents are initialized with a disjoint set of arms.…”
mentioning
confidence: 99%