2019
DOI: 10.1109/jsac.2019.2934003
|View full text |Cite
|
Sign up to set email alerts
|

Multi-Player Multi-Armed Bandits for Stable Allocation in Heterogeneous Ad-Hoc Networks

Abstract: Next generation networks are expected to be ultradense and aim to explore spectrum sharing paradigm that allows users to communicate in licensed, shared as well as unlicensed spectrum. Such ultra-dense networks will incur significant signaling load at base stations leading to a negative effect on spectrum and energy efficiency. To minimize signaling overhead, an adhoc approach is being considered for users communicating in the unlicensed and shared spectrums. For such users, decisions need to be completely dec… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
33
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 41 publications
(33 citation statements)
references
References 19 publications
0
33
0
Order By: Relevance
“…In this article, we explore MAB algorithms for the decision making tasks [17]- [22]. MAB algorithms are designed to identify the best arm among several arms in an unknown environment.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…In this article, we explore MAB algorithms for the decision making tasks [17]- [22]. MAB algorithms are designed to identify the best arm among several arms in an unknown environment.…”
Section: Related Workmentioning
confidence: 99%
“…They guarantee optimal balance between exploration (select all arms a sufficient number of times) and exploitation (select best arm as many times as possible). Popular MAB algorithms include the upper confidence bound (UCB) algorithm and its extensions (UCB_V and UCB_T), Kullback-Leibler (KL) divergence based UCB algorithm (KLUCB), and Thompson sampling (TS) [17]- [22]. To the best of our knowledge, none of these algorithms have ever been realized on the SoC.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Distributed MABs, which are extensions of basic MABs, have been studied extensively recently in different settings [6,15,19,20,27,28,31,33,36]. Distributed bandits is well motivated by a broad range application scenarios such as (1) large-scale learning systems [13], in domains such as online advertising and recommendation systems; (2) cooperative search by multiple robots [18,25]; (3) applications in wireless cognitive radio [5,12,26,27]; and distributed learning in geographically distributed communication systems, such as a set of IoT devices learning about the underlying environments [3,7,14,29,37]. Most prior work on multi-agent MABs assume that agents are homogeneous: all agents have full access to the set of all arms, and hence they solve the same instance of a MAB problem, with the aim to minimize the aggregate regret of the agents either in a competition setting [2,5,6,8,9,12,26,27,36], i.e., degraded or no-reward when multiple agents pull the same arm, or in a collaboration/cooperation setting [20,22,23,28,31,36], where agents pulling the same arm observe independent rewards, and agents can communicate their observations to each other in order to improve their learning performance.…”
Section: Introductionmentioning
confidence: 99%
“…These studies seldom consider the uncertainty of users' behaviours, so this paper introduces an online learning method called multi-armed bandits (MAB) to solve the problem. MAB has shown effectiveness and merit in air conditioning demand aggregation [16] and many other sequential decisionmaking problems containing uncertain/unknown behavioural factors [17][18][19][20][21][22][23][24][25][26][27]. In reference [28], an adversarial MAB framework is applied to learn the signal response of thermal control loads for demand response in real-time.…”
Section: Introductionmentioning
confidence: 99%