We aim to design universal algorithms for online convex optimization, which can handle multiple common types of loss functions simultaneously. The previous state-of-the-art universal method has achieved the minimax optimality for general convex, exponentially concave and strongly convex loss functions. However, it remains an open problem whether smoothness can be exploited to further improve the theoretical guarantees. In this paper, we provide an affirmative answer by developing a novel algorithm, namely UFO, which achieves O(√L*), O(d log L*) and O(log L*) regret bounds for the three types of loss functions respectively under the assumption of smoothness, where L* is the cumulative loss of the best comparator in hindsight, and d is dimensionality. Thus, our regret bounds are much tighter when the comparator has a small loss, and ensure the minimax optimality in the worst case. In addition, it is worth pointing out that UFO is the first to achieve the O(log L*) regret bound for strongly convex and smooth functions, which is tighter than the existing small-loss bound by an O(d) factor.
In this paper, we study the multi-objective bandits (MOB) problem, where a learner repeatedly selects one arm to play and then receives a reward vector consisting of multiple objectives. MOB has found many real-world applications as varied as online recommendation and network routing. On the other hand, these applications typically contain contextual information that can guide the learning process which, however, is ignored by most of existing work. To utilize this information, we associate each arm with a context vector and assume the reward follows the generalized linear model (GLM). We adopt the notion of Pareto regret to evaluate the learner's performance and develop a novel algorithm for minimizing it. The essential idea is to apply a variant of the online Newton step to estimate model parameters, based on which we utilize the upper confidence bound (UCB) policy to construct an approximation of the Pareto front, and then uniformly at random choose one arm from the approximate Pareto front. Theoretical analysis shows that the proposed algorithm achieves anÕ(d √ T ) Pareto regret, where T is the time horizon and d is the dimension of contexts, which matches the optimal result for single objective contextual bandits problem. Numerical experiments demonstrate the effectiveness of our method.
The Adam algorithm has become extremely popular for large-scale machine learning. Under convexity condition, it has been proved to enjoy a data-dependant O( √ T ) regret bound where T is the time horizon. However, whether strong convexity can be utilized to further improve the performance remains an open problem. In this paper, we give an affirmative answer by developing a variant of Adam (referred to as SAdam) which achieves a data-dependant O(log T ) regret bound for strongly convex functions. The essential idea is to maintain a faster decaying yet under controlled step size for exploiting strong convexity. In addition, under a special configuration of hyperparameters, our SAdam reduces to SC-RMSprop, a recently proposed variant of RMSprop for strongly convex functions, for which we provide the first data-dependent logarithmic regret bound. Empirical results on optimizing strongly convex functions and training deep networks demonstrate the effectiveness of our method.Preprint. Under review.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.