2020
DOI: 10.48550/arxiv.2006.08558
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Learning Diverse and Discriminative Representations via the Principle of Maximal Coding Rate Reduction

Abstract: To learn intrinsic low-dimensional structures from high-dimensional data that most discriminate between classes, we propose the principle of Maximal Coding Rate Reduction (MCR 2 ), an information-theoretic measure that maximizes the coding rate difference between the whole dataset and the sum of each individual class. We clarify its relationships with most existing frameworks such as cross-entropy, information bottleneck, information gain, contractive and contrastive learning, and provide theoretical guarantee… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(8 citation statements)
references
References 34 publications
0
8
0
Order By: Relevance
“…Indeed, the study of the interplay of the learned features and the final classifier enables precise characterization of the adversarial robustness of the learned model [112]. On the other hand, another line of recent work [33,34] empirically showed and argued that mapping each class to a linearly separable subspace with maximum dimension (instead of collapsing them to a vertex of Simplex ETF) can improve robustness against random data corruptions such as label noise. Further empirical and theoretical investigations are needed to clarify our understandings of potential benefits and full implications of N C for robustness.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…Indeed, the study of the interplay of the learned features and the final classifier enables precise characterization of the adversarial robustness of the learned model [112]. On the other hand, another line of recent work [33,34] empirically showed and argued that mapping each class to a linearly separable subspace with maximum dimension (instead of collapsing them to a vertex of Simplex ETF) can improve robustness against random data corruptions such as label noise. Further empirical and theoretical investigations are needed to clarify our understandings of potential benefits and full implications of N C for robustness.…”
Section: Discussionmentioning
confidence: 99%
“…In addition, the variability collapse of N C aligns with information bottleneck theory [32] which hypothesizes that neural networks seek to preserve only the minimal set of information in the learned feature representations for predicting the label hence discourage any additional variabilities. On the other hand, a recent line of work [33,34] raises controversial questions on whether N C improves robustness against data corruptions by showing that diverse features that preserve the intrinsic structure of data can better handle label corruptions. Therefore, an in-depth theoretical study of the N C phenomenon could provide further insights for addressing all these fundamental questions (see Section 5 for a detailed discussion).…”
Section: An Intriguing Phenomenon In Deep Networkmentioning
confidence: 99%
See 1 more Smart Citation
“…• Feature normalization is a common practice in training deep networks. Recently, many existing results demonstrated that training with feature normalization often improves the quality of learned representation with better class separation [5,13,21,42,60,73,75,81]. Such a representation is closely related to the discriminative representation in literature; see, e.g., [42,60,73,80].…”
Section: Motivations and Contributionsmentioning
confidence: 99%
“…On the other hand, different mathematical models are proposed to model the structure of real-world data, such as the long tail theory (Zhu et al, 2014;Feldman, 2020;Feldman & Zhang, 2020), and the hidden manifold model (Goldt et al, 2019). Lately, Yu et al (2020) unveil several profound geometric properties of neural networks in the feature space from the viewpoint of discriminative representations (see also ; Anonymous (2021)).…”
Section: Geometric Properties Of Neuralmentioning
confidence: 99%