2018
DOI: 10.1109/jstsp.2018.2846218
|View full text |Cite
|
Sign up to set email alerts
|

Compression-Based Regularization With an Application to Multitask Learning

Abstract: This paper investigates, from information theoretic grounds, a learning problem based on the principle that any regularity in a given dataset can be exploited to extract compact features from data, i.e., using fewer bits than needed to fully describe the data itself, in order to build meaningful representations of a relevant content (multiple labels). We begin by introducing the noisy lossy source coding paradigm with the log-loss fidelity criterion which provides the fundamental tradeoffs between the cross-en… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
3
2
1

Relationship

3
3

Authors

Journals

citations
Cited by 11 publications
(9 citation statements)
references
References 26 publications
0
9
0
Order By: Relevance
“…It is easy to show that this region corresponds to the set of achievable values of relevance and rate (µ, R) for the corresponding noisy lossy source coding problem with logarithmic distortion as was defined in Section 1.3.3. This set is closed and convex and it is not difficult to show that [34]:…”
Section: 34mentioning
confidence: 99%
“…It is easy to show that this region corresponds to the set of achievable values of relevance and rate (µ, R) for the corresponding noisy lossy source coding problem with logarithmic distortion as was defined in Section 1.3.3. This set is closed and convex and it is not difficult to show that [34]:…”
Section: 34mentioning
confidence: 99%
“…Therefore, it is of interest to explore links between information theory, representation learning and the information bottleneck, in order to cast insights onto the performance of deep neural networks under an information-theoretic lens. Preliminary steps in this direction are taken in [65,66].…”
Section: Problem 5: Generalization Error and The Information Bottleneckmentioning
confidence: 99%
“…An important part of this algorithm requires to compute ξ(R), whose solution involves an optimization formulation with respect to the encoder f n and the rate R [26]. For the computation of ξ(R) we use the algorithm presented in [29] which is a generalization of the Blahut-Arimoto algorithm [30]. 3 Blocklength Table I, II and III show the values of (UB( n , R) − LB( n , R)).…”
Section: A Numerical Examplesmentioning
confidence: 99%
“…Under some mild conditions given in[29], there exist guarantees to ensure this optimization converges to ξ(R).…”
mentioning
confidence: 99%