Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence 2020
DOI: 10.24963/ijcai.2020/406
|View full text |Cite
|
Sign up to set email alerts
|

Nearly Optimal Regret for Stochastic Linear Bandits with Heavy-Tailed Payoffs

Abstract: In this paper, we study the problem of stochastic linear bandits with finite action sets. Most of existing work assume the payoffs are bounded or sub-Gaussian, which may be violated in some scenarios such as financial markets. To settle this issue, we analyze the linear bandits with heavy-tailed payoffs, where the payoffs admit finite 1+epsilon moments for some epsilon in (0,1]. Through median of means and dynamic truncation, we propose two novel algorithms which enjoy a sublinear regret bound of widet… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(6 citation statements)
references
References 1 publication
0
6
0
Order By: Relevance
“…For multi-armed bandits, the question whether there is a single algorithm achieving near-optimal regret bounds in both the adversarial and the stochastic regimes was first asked by Bubeck and Slivkins (2012). A series of followup works refined the bounds through different techniques (Seldin and Slivkins, 2014;Auer and Chiang, 2016;Seldin and Lugosi, 2017;Wei and Luo, 2018;Zimmert and Seldin, 2019;Ito, 2021). One of the most successful approaches is developed by Wei and Luo (2018); Zimmert and Seldin (2019); Ito (2021), who demonstrated that a simple Online Mirror Descent (OMD) or Follow the Regularized Leader (FTRL) algorithm, which was originally designed only for the adversarial case, is able to achieve the best of both worlds.…”
Section: Related Workmentioning
confidence: 99%
“…For multi-armed bandits, the question whether there is a single algorithm achieving near-optimal regret bounds in both the adversarial and the stochastic regimes was first asked by Bubeck and Slivkins (2012). A series of followup works refined the bounds through different techniques (Seldin and Slivkins, 2014;Auer and Chiang, 2016;Seldin and Lugosi, 2017;Wei and Luo, 2018;Zimmert and Seldin, 2019;Ito, 2021). One of the most successful approaches is developed by Wei and Luo (2018); Zimmert and Seldin (2019); Ito (2021), who demonstrated that a simple Online Mirror Descent (OMD) or Follow the Regularized Leader (FTRL) algorithm, which was originally designed only for the adversarial case, is able to achieve the best of both worlds.…”
Section: Related Workmentioning
confidence: 99%
“…Other works consider an adversarial model of nonstationarity such as ours. Among these, works such as Auer, Cesa-Bianchi, et al, 2002;Bubeck and Slivkins, 2012;Seldin and Slivkins, 2014;Auer and Chiang, 2016 where the performance is compared to the single best action, and others considering a dynamic notion of regret. In these later works, bounds on the worst-case dynamic regret is given in function of parameters characterizing the degree of non-stationarity of the environment.…”
Section: Other Related Literaturementioning
confidence: 99%
“…By adapting the technique introduced by Auer (2002) for the underlying learning scenario (cf. (Chu et al, 2011;Li et al, 2017;Xue et al, 2020)), we can extend the COLSTIM algorithm to SUP-COLSTIM (Algorithm 2) in order to obtain a regret bound of order Õ( d T log(n)) without making an additional assumptions on the Gram matrix as in Corollary 3.3. The idea is to embed the choice mechanism of COLSTIM into a stage-wise approach which keeps track of "sufficiently accurately estimated promising arms" (cf.…”
Section: Sup-colst Imitatormentioning
confidence: 99%