Multi-armed bandit problems are considered as a paradigm of the trade-off between exploring the environment to find profitable actions and exploiting what is already known. In the stationary case, the distributions of the rewards do not change in time, Upper-Confidence Bound (UCB) policies, proposed in Agrawal (1995) and later analyzed in Auer et al. (2002), have been shown to be rate optimal.A challenging variant of the MABP is the non-stationary bandit problem where the gambler must decide which arm to play while facing the possibility of a changing environment. In this paper, we consider the situation where the distributions of rewards remain constant over epochs and change at unknown time instants. We analyze two algorithms: the discounted UCB and the sliding-window UCB. We establish for these two algorithms an upper-bound for the expected regret by upper-bounding the expectation of the number of times a suboptimal arm is played. For that purpose, we derive a Hoeffding type inequality for self normalized deviations with a random number of summands. We establish a lower-bound for the regret in presence of abrupt changes in the arms reward distributions. We show that the discounted UCB and the sliding-window UCB both match the lower-bound up to a logarithmic factor.
We consider optimal sequential allocation in the context of the so-called stochastic multi-armed bandit model. We describe a generic index policy, in the sense of Gittins [J. R. Stat. Soc. Ser. B Stat. Methodol. 41 (1979) 148-177], based on upper confidence bounds of the arm payoffs computed using the Kullback-Leibler divergence. We consider two classes of distributions for which instances of this general idea are analyzed: the kl-UCB algorithm is designed for oneparameter exponential families and the empirical KL-UCB algorithm for bounded and finitely supported distributions. Our main contribution is a unified finite-time analysis of the regret of these algorithms that asymptotically matches the lower bounds of Lai and Robbins [Adv. in Appl. Math. 6 (1985) 4-22] and Burnetas and Katehakis [Adv. in Appl. Math. 17 (1996) 122-142], respectively. We also investigate the behavior of these algorithms when used with general bounded rewards, showing in particular that they provide significant improvements over the state-of-the-art.
Computing smoothing distributions, the distributions of one or more states conditional on past, present, and future observations is a recurring problem when operating on general hidden Markov models. The aim of this paper is to provide a foundation of particle-based approximation of such distributions and to analyze, in a common unifying framework, different schemes producing such approximations. In this setting, general convergence results, including exponential deviation inequalities and central limit theorems, are established. In particular, time uniform bounds on the marginal smoothing error are obtained under appropriate mixing conditions on the transition kernel of the latent chain. In addition, we propose an algorithm approximating the joint smoothing distribution at a cost that grows only linearly with the number of particles.2 Since the first version of this paper has been released, an article [11] has been published. This work, developed completely independently from ours, complement the results presented in this manuscript. In particular, this paper presents a functional central limit theorems as well as nonasymptotic variance bounds. Additionally, this work shows how the forward filtering backward smoothing estimates of additive functionals can be computed using a forward only recursion.
This paper describes universal lossless coding strategies for compressing sources on countably infinite alphabets.Classes of memoryless sources defined by an envelope condition on the marginal distribution provide benchmarks for coding techniques originating from the theory of universal coding over finite alphabets. We prove general upperbounds on minimax regret and lower-bounds on minimax redundancy for such source classes. The general upper bounds emphasize the role of the Normalized Maximum Likelihood codes with respect to minimax regret in the infinite alphabet context. Lower bounds are derived by tailoring sharp bounds on the redundancy of Krichevsky-Trofimov coders for sources over finite alphabets. Up to logarithmic (resp. constant) factors the bounds are matching for source classes defined by algebraically declining (resp. exponentially vanishing) envelopes. Effective and (almost) adaptive coding techniques are described for the collection of source classes defined by algebraically vanishing envelopes. Those results extend our knowledge concerning universal coding to contexts where the key tools from parametric inference are known to fail. keywords: NML; countable alphabets; redundancy; adaptive compression; minimax; I. INTRODUCTIONThis paper is concerned with the problem of universal coding on a countably infinite alphabet X (say the set of positive integers N + or the set of integers N ) as described for example by .Throughout this paper, a source on the countable alphabet X is a probability distribution on the set X N of infinite sequences of symbols from X (this set is endowed with the σ-algebra generated by sets of the form n i=1 {x i }×X N where all x i ∈ X and n ∈ N). The symbol Λ will be used to denote various classes of sources on the countably infinite alphabet X . The sequence of symbols emitted by a source is denoted by the X AE -valued random variable X = (X n ) n∈N . If P denotes the distribution of X, P n denotes the distribution of X 1:n = X 1 , ..., X n , and we let Λ n = {P n : P ∈ Λ}. For any countable set X , let M 1 (X ) be the set of all probability measures on X .From Shannon noiseless coding Theorem (see Cover and Thomas, 1991), the binary entropy of P n , H(X 1:n ) = E P n [− log P (X 1:n )] provides a tight lower bound on the expected number of binary symbols needed to encode outcomes of P n . Throughout the paper, logarithms are in base 2. In the following, we shall only consider finite entropy sources on countable alphabets, and we implicitly assume that H(X 1:n ) < ∞. The expected redundancy of any distribution Q n ∈ M 1 (X n ), defined as the difference between the expected code length E P [− log Q n (X 1:n )] and H(X 1:n ), is equal to the Kullback-Leibler divergence (or relative entropy) D(P n , Q n ) = x∈X n P n {x} log P n (x) Q n (x) = P n log P n (X1:n) Q n (X1:n) . Universal coding attempts to develop sequences of coding probabilities (Q n ) n so as to minimize expected redundancy over a whole class of sources. Technically speaking, several distinct notions of universa...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.