2017
DOI: 10.48550/arxiv.1707.02038
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Tutorial on Thompson Sampling

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
85
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 70 publications
(91 citation statements)
references
References 36 publications
1
85
0
Order By: Relevance
“…WSLTS performs Thompson Sampling using the reshaped posterior, excluding the previously selected arm. That is, WSLTS follows standard Thompson Sampling for Bernoulli bandits, as described in [15], with two important differences: First, as opposed to sampling from the posterior over all arms, the sampled reward probability of the previously selected arm is set to zero. This is to ensure that WSLTS follows the core semantics of WSLS as a strict generalization that allows for more sophisticated exploration/exploitation mechanisms.…”
Section: Win-stay Lose-thompson-sample (Wslts)mentioning
confidence: 99%
“…WSLTS performs Thompson Sampling using the reshaped posterior, excluding the previously selected arm. That is, WSLTS follows standard Thompson Sampling for Bernoulli bandits, as described in [15], with two important differences: First, as opposed to sampling from the posterior over all arms, the sampled reward probability of the previously selected arm is set to zero. This is to ensure that WSLTS follows the core semantics of WSLS as a strict generalization that allows for more sophisticated exploration/exploitation mechanisms.…”
Section: Win-stay Lose-thompson-sample (Wslts)mentioning
confidence: 99%
“…Independent beta-distributed priors with parameters α k = 1 and β k = 1 (corresponding to a uniform distribution) over the estimation of each p k are assumed. At each iteration of TS, a sample is drawn from the posterior distribution of p k for each arm, and the arm with the largest sampled value is selected (Chapelle and Li, 2011;Russo et al, 2017). Choosing actions with TS balances exploration and exploitation in the long run, sampling from arms with the goal of converging on an optimal arm asymptotically (Agrawal and Goyal, 2012).…”
Section: Beta-bernoulli Thompson Samplingmentioning
confidence: 99%
“…To balance exploitation with exploration, we use a sampling algorithm to produce plausible estimates of the probability of a click and the probability of a "yes" survey. We compared both Thompson sampling [16,20] and EwS [13], and qualitatively we found that results in our application looked better with Thompson sampling so we focus on it here. However, EwS is reasonable to use as well.…”
Section: Learning With Discrete Contextmentioning
confidence: 99%