2013 Winter Simulations Conference (WSC) 2013
DOI: 10.1109/wsc.2013.6721457
|View full text |Cite
|
Sign up to set email alerts
|

Optimal learning with non-Gaussian rewards

Abstract: In this disseration, the author studies sequential Bayesian learning problems modeled under non-Gaussian distributions. We focus on a class of problems called the multi-armed bandit problem, and studies its optimal learning strategy, the Gittins index policy. The Gittins index is computationally intractable and approximation methods have been developed for Gaussian reward problems. We construct a novel theoretical and computational framework for the Gittins index under nonGaussian rewards. By interpolating the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2013
2013
2016
2016

Publication Types

Select...
2
1

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 40 publications
0
2
0
Order By: Relevance
“…First, the rate of convergence is optimized when the exact OCBA ratios are used, but this is not guaranteed if those ratios are only achieved asymptotically. Second, it is likely that the normality assumptions play a key role in the equivalence, since EI-type methods may not even be consistent in non-normal settings (Ding and Ryzhov 2015). Nonetheless, normality assumptions remain widely used in the literature and in practice, and it may be possible to obtain similar results for other learning problems where such assumptions are made.…”
Section: Resultsmentioning
confidence: 99%
“…First, the rate of convergence is optimized when the exact OCBA ratios are used, but this is not guaranteed if those ratios are only achieved asymptotically. Second, it is likely that the normality assumptions play a key role in the equivalence, since EI-type methods may not even be consistent in non-normal settings (Ding and Ryzhov 2015). Nonetheless, normality assumptions remain widely used in the literature and in practice, and it may be possible to obtain similar results for other learning problems where such assumptions are made.…”
Section: Resultsmentioning
confidence: 99%
“…They observe that ν KG a can be zero, but do not appear to recognize that this can yield dominated actions under the policy. Later work (Ding and Ryzhov [4]) showed that this can lead to the offline KG policy never choosing the greedy arm, an extreme case of dominated errors. However, with the online KG policy the greedy arm will eventually be selected as ν KG a for the other arm tends to zero.…”
Section: Exponential Rewardsmentioning
confidence: 99%