Compression-Based Regularization With an Application to Multitask Learning

Vera, Matías; Vega, Leonardo Rey; Piantanida, Pablo

doi:10.1109/jstsp.2018.2846218

Cited by 11 publications

(9 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It is easy to show that this region corresponds to the set of achievable values of relevance and rate (µ, R) for the corresponding noisy lossy source coding problem with logarithmic distortion as was defined in Section 1.3.3. This set is closed and convex and it is not difficult to show that [34]:…”

Section: 34mentioning

confidence: 99%

Information Bottleneck and Representation Learning

Piantanida¹,

Vega²

2021

Information-Theoretic Methods in Data Science

Self Cite

View full text Add to dashboard Cite

A grand challenge in representation learning is the development of computational algorithms that learn the different explanatory factors of variation behind highdimensional data. Representation models (usually referred to as encoders) are often determined for optimizing performance on training data when the real objective is to generalize well to other (unseen) data. The first part of this chapter is devoted to provide an overview of and introduction to fundamental concepts in statistical learning theory and the Information Bottleneck principle. It serves as a mathematical basis for the technical results given in the second part, in which an upper bound to the generalization gap corresponding to the crossentropy risk is given. When this penalty term times a suitable multiplier and the cross entropy empirical risk are minimized jointly, the problem is equivalent to optimizing the Information Bottleneck objective with respect to the empirical data distribution. This result provides an interesting connection between mutual information and generalization, and helps to explain why noise injection during the training phase can improve the generalization ability of encoder models and enforce invariances in the resulting representations. Information Bottleneck and Representation Learningat one point a message selected at another point." Shannon further argued that the meaning of a message is subjective, i.e., dependent on the observer, and irrelevant to the engineering problem of communication. However, what does matter for the theory of communication is finding suitable representations for given data. In source coding, for example, one generally aims at distilling the relevant information from the data by removing unnecessary redundancies. This can be cast in information-theoretic terms, as higher redundancy makes data more predictable and lowers its information content.In the context of learning [3,4], we propose to distinguish these two rather different aspects of data: information and knowledge. Information contained in data is unpredictable and random, while additional structure and redundancy in the data stream constitutes knowledge about the data generation process, which a learner must acquire. Indeed, according to connectionist models [5], the redundancy contained within messages enables the brain to build up its cognitive maps and the statistical regularities in these messages are being used for this purpose. Hence, this knowledge, provided by redundancy [6,7] in the data, must be what drives unsupervised learning. While information theory is a unique success story, from its birth, it discarded knowledge as being irrelevant to the engineering problem of communication. However, knowledge is recognized as being a critical -almost central-component of representation learning. The present monograph provides an information-theoretic treatment of this problem.Knowledge representation. The data deluge of recent decades leads to new expectations for scientific discoveries from massive data. While mankind is drowning in...

show abstract

Section: 34mentioning

confidence: 99%

Information Bottleneck and Representation Learning

Piantanida¹,

Vega²

2021

Information-Theoretic Methods in Data Science

Self Cite

View full text Add to dashboard Cite

show abstract

“…Therefore, it is of interest to explore links between information theory, representation learning and the information bottleneck, in order to cast insights onto the performance of deep neural networks under an information-theoretic lens. Preliminary steps in this direction are taken in [65,66].…”

Section: Problem 5: Generalization Error and The Information Bottleneckmentioning

confidence: 99%

Generalization Error in Deep Learning

Jakubovitz

Giryes

Rodrigues

2019

Applied and Numerical Harmonic Analysis

View full text Add to dashboard Cite

Deep learning models have lately shown great performance in various fields such as computer vision, speech recognition, speech translation, and natural language processing. However, alongside their state-of-the-art performance, it is still generally unclear what is the source of their generalization ability. Thus, an important question is what makes deep neural networks able to generalize well from the training set to new data. In this article, we provide an overview of the existing theory and bounds for the characterization of the generalization error of deep neural networks, combining both classical and more recent theoretical and empirical results.

show abstract

“…An important part of this algorithm requires to compute ξ(R), whose solution involves an optimization formulation with respect to the encoder f n and the rate R [26]. For the computation of ξ(R) we use the algorithm presented in [29] which is a generalization of the Blahut-Arimoto algorithm [30]. 3 Blocklength Table I, II and III show the values of (UB( n , R) − LB( n , R)).…”

Section: A Numerical Examplesmentioning

confidence: 99%

“…Under some mild conditions given in[29], there exist guarantees to ensure this optimization converges to ξ(R).…”

mentioning

confidence: 99%

New Results on Testing Against Independence with Rate-Limited Constraints

Espinosa

Silva

Piantanida

2019

2019 IEEE Global Conference on Signal and Information Processing (GlobalSIP)

Self Cite

View full text Add to dashboard Cite

The central problem of Hypothesis Testing (HT) consists in determining the error exponent of the optimal Type II error for a fixed (or decreasing with the sample size) Type I error restriction. This work studies error exponent limits in distributed HT subject to partial communication constraints. We derive general conditions on the Type I error restriction under which the error exponent of the optimal Type II error presents a closed-form characterization for the specific case of testing against independence. By building on concentration inequalities and rate-distortion theory, we first derive the performance limit in terms of the error exponent for a family of decreasing Type I error probabilities. Then, we investigate the non-asymptotic (or finite sample-size) regime for which novel upper and lower bounds are derived to bound the optimal Type II error probability. These results shed light on the velocity at which the error exponents, i.e. the asymptotic limits, are achieved as the samples grows.

show abstract

Compression-Based Regularization With an Application to Multitask Learning

Cited by 11 publications

References 26 publications

Information Bottleneck and Representation Learning

Information Bottleneck and Representation Learning

Generalization Error in Deep Learning

New Results on Testing Against Independence with Rate-Limited Constraints

Contact Info

Product

Resources

About