2017
DOI: 10.48550/arxiv.1711.02317
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Multi-Player Bandits Revisited

Lilian Besson,
Emilie Kaufmann

Abstract: Multi-player Multi-Armed Bandits (MAB) have been extensively studied in the literature, motivated by applications to Cognitive Radio systems. Driven by such applications as well, we motivate the introduction of several levels of feedback for multi-player MAB algorithms. Most existing work assume that sensing information is available to the algorithm. Under this assumption, we improve the state-of-theart lower bound for the regret of any decentralized algorithms and introduce two algorithms, RandTopM and MCTopM… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
8
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(8 citation statements)
references
References 20 publications
(46 reference statements)
0
8
0
Order By: Relevance
“…In our system, as the agents have uniform rank across all arms there exists a unique stable matching (which is not true when agents are ranked non-uniformly across arms.) Indeed, in any stable match agent j must match with it's most preferred arm which is not matched with any agent with rank (j − 1) or higher 5 .…”
Section: Problem Settingmentioning
confidence: 99%
See 3 more Smart Citations
“…In our system, as the agents have uniform rank across all arms there exists a unique stable matching (which is not true when agents are ranked non-uniformly across arms.) Indeed, in any stable match agent j must match with it's most preferred arm which is not matched with any agent with rank (j − 1) or higher 5 .…”
Section: Problem Settingmentioning
confidence: 99%
“…Thus, agents cannot infer if its actions cause a collision to other higher The arm-means for sub-optimal arms for each agent are chosen i.i.d. uniformly over [0, 0.8], while the arm-mean of i ∈ [5] for agent i was set to 0.9. The rewards are binary.…”
Section: Comparison With Regret Bounds For Related Modelsmentioning
confidence: 99%
See 2 more Smart Citations
“…Due to decentralized and simultaneous policy selection, collisions have to be taken into consideration when more than one player happen to choose the same arm. For this situation, a number of studies [21]- [23] assume that no player receives any reward, while some other studies [20] assume that the colliding players split the reward over the single arm in an arbitrary way. Such a model is frequently used to describe the user-channel matching problem in a CRN, where the channel condition is modeled to be stochastic, partially due to the unpredictable activities of the primary users [16], [25].…”
Section: B Mp-mab For Resource Allocation In Wireless Networkmentioning
confidence: 99%