2022
DOI: 10.48550/arxiv.2206.02656
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Regret-Variance Trade-Off in Online Learning

Abstract: We consider prediction with expert advice for strongly convex and bounded losses, and investigate trade-offs between regret and "variance" (i.e., squared difference of learner's predictions and best expert predictions). With K experts, the Exponentially Weighted Average (EWA) algorithm is known to achieve O(log K) regret. We prove that a variant of EWA either achieves a negative regret (i.e., the algorithm outperforms the best expert), or guarantees a O(log K) bound on both variance and regret. Building on thi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 10 publications
0
2
0
Order By: Relevance
“…We also would like to mention that excess risk bounds can be derived from regret bounds for online learning algorithms via the onlineto-batch conversion. We refer to Gaillard and Wintenberger, 2018;Van der Hoeven et al, 2022) for related high probability upper bounds. However, in our context, the online-to-batch approach yields risk bounds with additional logarithmic factors, which makes it unsuitable for our purposes.…”
Section: Related Workmentioning
confidence: 99%
“…We also would like to mention that excess risk bounds can be derived from regret bounds for online learning algorithms via the onlineto-batch conversion. We refer to Gaillard and Wintenberger, 2018;Van der Hoeven et al, 2022) for related high probability upper bounds. However, in our context, the online-to-batch approach yields risk bounds with additional logarithmic factors, which makes it unsuitable for our purposes.…”
Section: Related Workmentioning
confidence: 99%
“…In that context, a common practice is to formulate the problem as one of the renowned Multi-armed bandit variants [40] and then conduct regret analysis, showing the expected total regret (defined as the gap in the total utility achieved by a given policy and a prophet optimal) is upper bounded by a certain function of the total time horizon [41][42][43][44]. A few recent works investigate the potential tradeoff between variance and regret in online learning; see, e.g., [45,46]. In particular, Vakili et al [46] introduced and analyzed the performance of several risk-averse policies in both bandit and full information settings under the metric of mean-variance [47].…”
Section: Main Techniques and Other Related Workmentioning
confidence: 99%