2019
DOI: 10.3233/jifs-179052
|View full text |Cite
|
Sign up to set email alerts
|

A comparison between UCB and UCB-Tuned as selection policies in GGP

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
16
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 14 publications
(16 citation statements)
references
References 12 publications
0
16
0
Order By: Relevance
“…LinUCB [ 20 ] extends the Auer’s UCB algorithm in [ 10 , 22 ] to the contextual concept. Its main clue is to figure out each arm’s probable reward by finding a linear relationship between the previous rewards of the arm and its current context vector as given in (10).…”
Section: Proposed Ea-cmab Algorithmsmentioning
confidence: 99%
See 1 more Smart Citation
“…LinUCB [ 20 ] extends the Auer’s UCB algorithm in [ 10 , 22 ] to the contextual concept. Its main clue is to figure out each arm’s probable reward by finding a linear relationship between the previous rewards of the arm and its current context vector as given in (10).…”
Section: Proposed Ea-cmab Algorithmsmentioning
confidence: 99%
“…We examine the effect of Wi-Fi contextual information on the overall system performance by leveraging LinUCB [ 20 ] and CTS [ 21 ] algorithms and compare them with their noncontextual versions, i.e., UCB [ 22 ] and TS [ 23 ].…”
Section: Introductionmentioning
confidence: 99%
“…The motivation behind using online learning comes from its ability to deal with both complex and dynamic environments effectively [26], without any prior information, where an agent learns to enhance its future actions based only on its past actions/observations. Towards this end, the gateway UAV selection problem is formulated as a budget-constrained multi-player multi-armed bandit (MAB) problem [27][28][29]. MAB is a particular type of online learning, where an agent wants to maximize its long-term rewards (minimize regrets) via utilizing its previous best arm selection or investigating new choices, known as the exploitation-exploration tradeoff [27][28][29].…”
Section: Introductionmentioning
confidence: 99%
“…Towards this end, the gateway UAV selection problem is formulated as a budget-constrained multi-player multi-armed bandit (MAB) problem [27][28][29]. MAB is a particular type of online learning, where an agent wants to maximize its long-term rewards (minimize regrets) via utilizing its previous best arm selection or investigating new choices, known as the exploitation-exploration tradeoff [27][28][29]. Since MAB techniques work online without any prior knowledge about the environment other than the player's observations while playing, they are considered as the most appropriate solutions for this deemed problem.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation