2013 Australian Control Conference 2013
DOI: 10.1109/aucc.2013.6697280
|View full text |Cite
|
Sign up to set email alerts
|

On effectiveness of the Mirror Decent Algorithm for a stochastic multi-armed bandit governed by a stationary finite Markov chain

Abstract: In this article, we study the effectiveness of the Mirror Descent Randomized Control Algorithm recently developed to a class of homogeneous finite Markov chains governed by the stochastic multi-armed bandit with unknown mean losses. We prove the explicit, non-asymptotic both upper and lower bounds for the mean losses at a given (finite) time horizon. These bounds are very similar as functions of problem parameters and time horizon, but with different logarithmic term and absolute constant. Numerical example il… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 36 publications
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?