2018
DOI: 10.48550/arxiv.1808.07593
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Caveats for information bottleneck in deterministic scenarios

Abstract: Information bottleneck (IB) is a method for extracting information from one random variable X that is relevant for predicting another random variable Y . To do so, IB identifies an intermediate "bottleneck" variable T that has low mutual information I(X; T ) and high mutual information I(Y ; T ). The IB curve characterizes the set of bottleneck variables that achieve maximal I(Y ; T ) for a given I(X; T ), and is typically explored by maximizing the IB Lagrangian, I(Y ; T ) − βI(X; T ). In some cases, Y is a d… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 9 publications
(12 citation statements)
references
References 23 publications
0
12
0
Order By: Relevance
“…Recent work analyzes deep neural networks through the lens of information theory [73][74][75][76][77][78][79][80][81][82][83], often computing measures of mutual information similar to those we discuss. Our result that the only usable information in a dataset is contained in its sample-sample second moment matrix K may inform or constrain this type of analysis.…”
Section: Discussionmentioning
confidence: 99%
“…Recent work analyzes deep neural networks through the lens of information theory [73][74][75][76][77][78][79][80][81][82][83], often computing measures of mutual information similar to those we discuss. Our result that the only usable information in a dataset is contained in its sample-sample second moment matrix K may inform or constrain this type of analysis.…”
Section: Discussionmentioning
confidence: 99%
“…The IB principle has recently been introduced for theoretical understanding and analysis of deep neural networks [2,21,34,43,48]. The authors optimize the networks with an iterative Blahut-Arimoto algorithm, which is infeasible in practical systems.…”
Section: Related Workmentioning
confidence: 99%
“…= H(z) − H(z | y) where H(z) is the entropy of z [CT06]. 3 given one can overcome some caveats associated with this framework [KTVK18] and practical difficulties such as how to accurately evaluate mutual information with finitely samples of degenerate distributions. 4 in case the labels can be corrupted or the learned features be tackled.…”
Section: Context and Motivationmentioning
confidence: 99%