2014
DOI: 10.1186/2194-3206-2-2
|View full text |Cite|
|
Sign up to set email alerts
|

Generalized Thompson sampling for sequential decision-making and causal inference

Abstract: Purpose: Sampling an action according to the probability that the action is believed to be the optimal one is sometimes called Thompson sampling. Methods: Although mostly applied to bandit problems, Thompson sampling can also be used to solve sequential adaptive control problems, when the optimal policy is known for each possible environment. The predictive distribution over actions can then be constructed by a Bayesian superposition of the policies weighted by their posterior probability of being optimal. Res… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
18
0

Year Published

2014
2014
2023
2023

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 22 publications
(18 citation statements)
references
References 58 publications
0
18
0
Order By: Relevance
“…Formally, the personalization problem can be regarded a contextual bandit problem (Yue et al, 2012;Ortega and Braun, 2013). The context X is given by the individual identifier and presents itself to the system.…”
Section: Exploration Versus Exploitation In Personalizationmentioning
confidence: 99%
“…Formally, the personalization problem can be regarded a contextual bandit problem (Yue et al, 2012;Ortega and Braun, 2013). The context X is given by the individual identifier and presents itself to the system.…”
Section: Exploration Versus Exploitation In Personalizationmentioning
confidence: 99%
“…This interpretation builds on previous work that has related computational and physical processes; see for example [54] for an overview. As discussed in the Methods, the cost of changing distributions can also be expressed in terms of complexity of sampling processes [50,51].…”
Section: Discussionmentioning
confidence: 99%
“…The computational complexity of the information-theoretic model of bounded rational decision making can also be interpreted in terms of a sampling complexity [50,51]. In particular, Equation (4) can be interpreted under a rejection sampling scheme where we want to obtain samples from P (a i ), but we are only able to sample from the distribution P 0 (a i ).…”
Section: Methodsmentioning
confidence: 99%
“…Another option discussed in the literature is Thompson sampling (Ortega and Braun, 2010, 2014; Aslanides et al, 2017). In our framework this corresponds to a two step action selection procedure where we first sample an environment and parameter pair (trueê¯t-1,trueθ¯) from a posterior factor (Bayesian or variational)…”
Section: Action Selection Based On Intrinsic Motivationsmentioning
confidence: 99%