2017
DOI: 10.1214/17-ejs1341si
|View full text |Cite
|
Sign up to set email alerts
|

Linear Thompson sampling revisited

Abstract: We derive an alternative proof for the regret of Thompson sampling (TS) in the stochastic linear bandit setting. While we obtain a regret bound of order O(d 3/2 √ T ) as in previous results, the proof sheds new light on the functioning of the TS. We leverage on the structure of the problem to show how the regret is related to the sensitivity (i.e., the gradient) of the objective function and how selecting optimal arms associated to optimistic parameters does control it. Thus we show that TS can be seen as a ge… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

6
196
0
1

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 134 publications
(203 citation statements)
references
References 15 publications
6
196
0
1
Order By: Relevance
“…To mitigate this shortcoming, we propose a unified method based on the BLR approximation of these two methods. This unification is inspired by the analyses in Abeille and Lazaric [4], Abbasi-Yadkori et al [1] for linear bandits. 1) For LINPSRL: we deploy BLR to approximate the posterior distribution over the Q-function using conjugate Gaussian prior and likelihood.…”
Section: Strategymentioning
confidence: 98%
See 1 more Smart Citation
“…To mitigate this shortcoming, we propose a unified method based on the BLR approximation of these two methods. This unification is inspired by the analyses in Abeille and Lazaric [4], Abbasi-Yadkori et al [1] for linear bandits. 1) For LINPSRL: we deploy BLR to approximate the posterior distribution over the Q-function using conjugate Gaussian prior and likelihood.…”
Section: Strategymentioning
confidence: 98%
“…1) For LINPSRL: we deploy BLR to approximate the posterior distribution over the Q-function using conjugate Gaussian prior and likelihood. In tabular MDP, this approach turns out to be similar to Osband Lazaric [4]). These two approximation procedures result in the same Gaussian distribution, and therefore, the same algorithm.…”
Section: Strategymentioning
confidence: 99%
“…For MAB and LB, Agrawal and Goyal [2] proved a high probability regret bound. Abeille and Lazaric [19] showed the same regret bound in an alternative way and revealed conditions for variants of the TS algorithm to have such regret bounds. For the combinatorial semi-bandit problem and generalized problems including CLS, Wen et al [10] proved a regret bound regarding the Bayes cumulative regret proposed by Russo and Van Roy [23].…”
Section: Related Workmentioning
confidence: 75%
“…TS algorithms have been theoretically analyzed for several problems [2], [10], [16], [19]- [24]. For MAB and LB, Agrawal and Goyal [2] proved a high probability regret bound.…”
Section: Related Workmentioning
confidence: 99%
“…GT-TS workflow. GT-TS is a multi-armed bandit algorithm for experimental design that relies on two main concepts: i) an estimate of population's cell type discovery potential based on a variant of the Good-Toulmin estimator, and ii) a classic Thompson Sampling (TS) routine [Abeille andLazaric, 2017, Russo et al, 2017]. Pseudocode available in Supplementary Note 1.…”
Section: Discussionmentioning
confidence: 99%