2023
DOI: 10.1109/tit.2023.3249636
|View full text |Cite
|
Sign up to set email alerts
|

Information-Theoretic Analysis of Minimax Excess Risk

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
6
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(7 citation statements)
references
References 22 publications
1
6
0
Order By: Relevance
“…This result has since been extended in various forms, mostly concentrating on providing information-theoretic bounds for the generalization capabilities of learning algorithms, instead of looking at the excess risk; see, e.g., Raginsky et al [5], Lugosi and Neu [6], Jose and Simeone [7], and the references therein, just to mention a few of these works. The most relevant recent work relating to our bounds in Section 3 seems to be Xu and Raginsky [4], where, among other things, information-theoretic bounds were developed on the excess risk in a Bayesian learning framework; see also Hafez-Kolahi et al [8]. The bounds in [4] are not on the excess risk L * (Y|T(X)) − L * (Y|X); they involve training data, but their forms are similar to ours.…”
Section: Relationship With Prior Workmentioning
confidence: 87%
“…This result has since been extended in various forms, mostly concentrating on providing information-theoretic bounds for the generalization capabilities of learning algorithms, instead of looking at the excess risk; see, e.g., Raginsky et al [5], Lugosi and Neu [6], Jose and Simeone [7], and the references therein, just to mention a few of these works. The most relevant recent work relating to our bounds in Section 3 seems to be Xu and Raginsky [4], where, among other things, information-theoretic bounds were developed on the excess risk in a Bayesian learning framework; see also Hafez-Kolahi et al [8]. The bounds in [4] are not on the excess risk L * (Y|T(X)) − L * (Y|X); they involve training data, but their forms are similar to ours.…”
Section: Relationship With Prior Workmentioning
confidence: 87%
“…Notice that the bound is expressed as MI terms each involving U i and ∆L i,k , both being discrete random variables. This has not arose in the previous chained weight-based MI bounds where they either contain the continuous random variable S (Asadi et al, 2018;Zhou et al, 2022b;Clerico et al, 2022) or are conditioned on the continuous random variable Z (Hafez-Kolahi et al, 2020). Additionally, by the master definition of MI (Cover & Thomas, 2006, Eq.…”
Section: By the Independence Of U I And Z I(∆lmentioning
confidence: 97%
“…Furthermore, it is possible to establish further tightened loss-difference MI bounds for more general loss functions than those required in Theorem 3.2. Specifically, the loss function can be unbounded and continuous, as presented in next theorem, where we apply the chaining technique (Asadi et al, 2018;Hafez-Kolahi et al, 2020;Zhou et al, 2022b;Clerico et al, 2022) and the obtained bound consists of MI terms between U i and the successively quantized versions of ∆L i . To that end, let Err i (∆ i ) (−1) Ui ∆ i and let Γ ⊆ R be the range of ∆ .…”
Section: By the Independence Of U I And Z I(∆lmentioning
confidence: 99%
See 1 more Smart Citation
“…The line of work exploiting information measures to bound the expected generalization started in (Russo and Zou, 2016;Xu and Raginsky, 2017) and was then refined with a variety of approaches considering Conditional Mutual Information (Steinke and Zakynthinou, 2020;Haghifam et al, 2020), the Mutual Information between individual samples and the hypothesis (Bu et al, 2019) or improved versions of the original bounds (Issa et al, 2019;Hafez-Kolahi et al, 2020). Other approaches employed the Kullback-Leibler Divergence with a PAC-Bayesian approach (McAllester, 2013;Zhou et al, 2018).…”
Section: Related Workmentioning
confidence: 99%