2015 IEEE Information Theory Workshop (ITW) 2015
DOI: 10.1109/itw.2015.7133169
|View full text |Cite
|
Sign up to set email alerts
|

Deep learning and the information bottleneck principle

Abstract: Deep Neural Networks (DNNs) are analyzed via the theoretical framework of the information bottleneck (IB) principle. We first show that any DNN can be quantified by the mutual information between the layers and the input and output variables. Using this representation we can calculate the optimal information theoretic limits of the DNN and obtain finite sample generalization bounds. The advantage of getting closer to the theoretical limit is quantifiable both by the generalization bound and by the network's si… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

12
1,010
2
3

Year Published

2016
2016
2024
2024

Publication Types

Select...
7
2
1

Relationship

0
10

Authors

Journals

citations
Cited by 1,139 publications
(1,027 citation statements)
references
References 11 publications
12
1,010
2
3
Order By: Relevance
“…More recently, there is a small line of emerging work investigating the behaviour of neural networks from an information-theoretic perspective [15][16][17][18][19][20], with some work going as far back as [21]. The most relevant of these is the work by Schwartz-Ziv and Tishby [16], who show that feed-forward deep neural networks undergo a dynamic transition between drift-and diffusion-like regimes during training.…”
Section: Related Workmentioning
confidence: 99%
“…More recently, there is a small line of emerging work investigating the behaviour of neural networks from an information-theoretic perspective [15][16][17][18][19][20], with some work going as far back as [21]. The most relevant of these is the work by Schwartz-Ziv and Tishby [16], who show that feed-forward deep neural networks undergo a dynamic transition between drift-and diffusion-like regimes during training.…”
Section: Related Workmentioning
confidence: 99%
“…30 Conventionally, the process of generalizing the performance of the classifier for eventual new data 31 requires a series of good-practices in the use of the available data to train and then evaluate it [2,3]. In 32 this supervised scheme, the evaluation of the performance of the classifier involves the comparison of 33 the true labels K vs. the predicted labelsK, as the abstracted diagram in Fig.…”
Section: (B)mentioning
confidence: 99%
“…Design of the optimal distributed quantizer for the above scenario has been considered by the authors of [13][14][15][16][17][18], who suggest cyclic algorithms based on alternating minimization [19], to find the optimal quantization rules. The algorithm starts by initial guesses about the quantizers, that is, 0 1 , .…”
Section: International Journal Of Distributed Sensor Networkmentioning
confidence: 99%