2016
DOI: 10.1007/s13226-016-0186-3
|View full text |Cite
|
Sign up to set email alerts
|

Mechanisms with learning for stochastic multi-armed bandit problems

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
3
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
3
3

Relationship

2
4

Authors

Journals

citations
Cited by 6 publications
(4 citation statements)
references
References 11 publications
0
3
0
Order By: Relevance
“…Thompson sampling was arbitrarily selected as the policy tool for this case study using the Multi-Armed Bandit (MAB) approach, owing to its adept balance between exploration and exploitation, as well as its parameter-free nature. Thompson sampling is particularly useful in scenarios where action rewards follow a Bernoulli distribution, distinguishing successes from failures [12]. This policy is specifically tailored to tackle online decision problems, including the Multi-Armed Bandit problem.…”
Section: The Mab-problem-based Algorithmmentioning
confidence: 99%
“…Thompson sampling was arbitrarily selected as the policy tool for this case study using the Multi-Armed Bandit (MAB) approach, owing to its adept balance between exploration and exploitation, as well as its parameter-free nature. Thompson sampling is particularly useful in scenarios where action rewards follow a Bernoulli distribution, distinguishing successes from failures [12]. This policy is specifically tailored to tackle online decision problems, including the Multi-Armed Bandit problem.…”
Section: The Mab-problem-based Algorithmmentioning
confidence: 99%
“…When online learning is involved, the stochastic multiarmed bandit (MAB) problem captures the exploration vs. exploitation trade-off effectively [19,23,21,22,28,32,31,6]. The classical MAB problem involves learning the optimal agent from a set of agents with a fixed but unknown reward distribution [7,28,3,30].…”
Section: Related Workmentioning
confidence: 99%
“…We begin with NAIVE(Algorithm 1), a variant of exploration separated policy, EXPSEP [21] that achieves sub-linear regret guarantee in terms of time horizon T . It is easy to see that NAIVE is fair.…”
Section: Warming Up -Naive Algorithmmentioning
confidence: 99%