In the context of K-armed stochastic bandits with distribution only assumed to be supported by [0, 1], we introduce the first algorithm, called KL-UCB-switch, that enjoys simultaneously a distribution-free regret bound of optimal order √ KT and a distribution-dependent regret bound of optimal order as well, that is, matching the κ ln T lower bound by Lai and Robbins [1985] and Burnetas and Katehakis [1996]. This self-contained contribution simultaneously presents state-ofthe-art techniques for regret minimization in bandit models, and an elementary construction of nonasymptotic confidence bounds based on the empirical likelihood method for bounded distributions.
Stochastic and adversarial data are two widely studied settings in online learning. But many optimization tasks are neither i.i.d. nor fully adversarial, which makes it of fundamental interest to get a better theoretical understanding of the world between these extremes. In this work we establish novel regret bounds for online convex optimization in a setting that interpolates between stochastic i.i.d. and fully adversarial losses. By exploiting smoothness of the expected losses, these bounds replace a dependence on the maximum gradient length by the variance of the gradients, which was previously known only for linear losses. In addition, they weaken the i.i.d. assumption by allowing adversarially poisoned rounds or shifts in the data distribution. To accomplish this goal, we introduce two key quantities associated with the loss sequence, that we call the cumulative stochastic variance and the adversarial variation. Our upper bounds are attained by instances of optimistic follow the regularized leader, and we design adaptive learning rates that automatically adapt to the cumulative stochastic variance and adversarial variation. In the fully i.i.d. case, our bounds match the rates one would expect from results in stochastic acceleration, and in the fully adversarial case they gracefully deteriorate to match the minimax regret. We further provide lower bounds showing that our regret upper bounds are tight for all intermediate regimes for the cumulative stochastic variance and the adversarial variation.
In this paper we consider a distributed online learning setting for joint regret with communication constraints. This is a multi-agent setting in which in each round t an adversary activates an agent, which has to issue a prediction. A subset of all the agents may then communicate a b-bit message to their neighbors in a graph. All agents cooperate to control the joint regret, which is the sum of the losses of the agents minus the losses evaluated at the best fixed common comparator parameters u. We provide a comparator-adaptive algorithm for this setting, which means that the joint regret scales with the norm of the comparator u . To address communication constraints we provide deterministic and stochastic gradient compression schemes and show that with these compression schemes our algorithm has worstcase optimal regret for the case that all agents communicate in every round. Additionally, we exploit the comparator-adaptive property of our algorithm to learn the best partition from a set of candidate partitions, which allows different subsets of agents to learn a different comparator.
We consider stochastic bandit problems with K arms, each associated with a bounded distribution supported on the range [m, M ]. We do not assume that the range [m, M ] is known and show that there is a cost for learning this range. Indeed, a new trade-off between distribution-dependent and distribution-free regret bounds arises, which, for instance, prevents from simultaneously achieving the typical ln T and √T bounds. For instance, a √ T distribution-free regret bound may only be achieved if the distribution-dependent regret bounds are at least of order √T . We exhibit a strategy achieving the rates for regret indicated by the new trade-off.Preprint. Under review.
A sequence of works in unconstrained online convex optimisation have investigated the possibility of adapting simultaneously to the norm U of the comparator and the maximum norm G of the gradients. In full generality, matching upper and lower bounds are known which show that this comes at the unavoidable cost of an additive GU 3 , which is not needed when either G or U is known in advance. Surprisingly, recent results by Kempka et al. (2019) show that no such price for adaptivity is needed in the specific case of 1-Lipschitz losses like the hinge loss. We follow up on this observation by showing that there is in fact never a price to pay for adaptivity if we specialise to any of the other common supervised online learning losses: our results cover log loss, (linear and non-parametric) logistic regression, square loss prediction, and (linear and non-parametric) least-squares regression. We also fill in several gaps in the literature by providing matching lower bounds with an explicit dependence on U . In all cases we obtain scale-free algorithms, which are suitably invariant under rescaling of the data. Our general goal is to establish achievable rates without concern for computational efficiency, but for linear logistic regression we also provide an adaptive method that is as efficient as the recent non-adaptive algorithm by Agarwal et al. (2021).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.