Caching Gaussians: Minimizing total correlation on the Gray-Wyner network

Veld, Giel J. Op ’t; Gastpar, Michael

doi:10.1109/ciss.2016.7460549

Cited by 13 publications

(18 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…. , k. This generalizes (9) for providing the informative k-dimensional representations about the common structure shared by X 1 , . .…”

Section: B the Informative K-dimensional Attributesmentioning

confidence: 95%

“…This combines the knowledge from different domains, and offers a unified understanding for disciplines in information theory, statistics, and machine learning. We would also like to mention that the idea of studying the tradeoff between the total correlation and the common information rate was also employed in [9] [10] for Gaussian vectors in caching problems, while our works investigate this tradeoff for general discrete random variables. Moreover, the correlation explanation (CorEx) introduced by [11] also applied the total correlation as the information criterion to unsupervised learning.…”

mentioning

confidence: 99%

See 1 more Smart Citation

An Information-Theoretic Approach to Unsupervised Feature Selection for High-Dimensional Data

Huang

Zheng

2020

IEEE J. Sel. Areas Inf. Theory

View full text Add to dashboard Cite

In this paper, we propose an information-theoretic approach to design the functional representations to extract the hidden common structure shared by a set of random variables. The main idea is to measure the common information between the random variables by Watanabe's total correlation, and then find the hidden attributes of these random variables such that the common information is reduced the most given these attributes. We show that these attributes can be characterized by an exponential family specified by the eigen-decomposition of some pairwise joint distribution matrix. Then, we adopt the log-likelihood functions for estimating these attributes as the desired functional representations of the random variables, and show that such representations are informative to describe the common structure. Moreover, we design both the multivariate alternating conditional expectation (MACE) algorithm to compute the proposed functional representations for discrete data, and a novel neural network training approach for continuous or high-dimensional data. Furthermore, we show that our approach has deep connections to existing techniques, such as Hirschfeld-Gebelein-Rényi (HGR) maximal correlation, linear principal component analysis (PCA), and consistent functional map, which establishes insightful connections between information theory and machine learning. Finally, the performances of our algorithms are validated by numerical simulations.S.-L. Huang is with the 1 Specifically, for random variables X1, . . . , X d , the total correlation is defined as the Kullback-Leibler (K-L) divergence D(PX 1 ···X d PX 1 · · · PX d ) between the joint distribution and the product of the marginal distributions.2 Note that I(U ; X d ) measures the amount of information of U about the whole X d , while L(X d |U ) measure the amount of information only about the common structure. The constraint I(U ; X d ) ≤ δ allows us to focus on low-dimensional attribute of W , in which we typically choose δ to be small.

show abstract

“…. , k. This generalizes (9) for providing the informative k-dimensional representations about the common structure shared by X 1 , . .…”

Section: B the Informative K-dimensional Attributesmentioning

confidence: 95%

mentioning

confidence: 99%

An Information-Theoretic Approach to Unsupervised Feature Selection for High-Dimensional Data

Huang

Zheng

2020

IEEE J. Sel. Areas Inf. Theory

View full text Add to dashboard Cite

show abstract

“…(8) Proof: It follows by time sharing between W = W * and W = (X 1 , X 2 ), and by setting V as the achiever of the conditional Wyner's common information C(X ′ 1 ; X ′ 2 |W ). By Proposition 2 and Lemma 2, the gap ∆ between the lower bound in (7) and the upper bound in (8) satisfies ∆ ≤ min{C, I(X, Y ; X ′ , Y ′ )}/4, if C(X; Y ) + C(X ′ ; Y ′ |W * ) ≤ C ≤ C * .…”

Section: Dynamic Caching Problemmentioning

confidence: 97%

Successive Refinement to Caching for Dynamic Content

Sen

Gastpar

2019

2019 IEEE International Symposium on Information Theory (ISIT)

View full text Add to dashboard Cite

To reduce the network load during peak hours, servers deliver partial data to users during the off-peak time of the network before the actual requests are known, which is known as caching. This paper studies a single user caching problem in which the file contents are subject to dynamic modifications with respect to a certain probability distribution. To cope with the dynamical nature of the file contents, a successive refinement approach to caching is presented: partial information of the original data is cached first and then if there is a modification, a refinement to the previously cached data is delivered to the user. Given a fixed cache memory, there is a tension between the rates of two cache descriptions. The problem of optimal caching strategies is formulated through a successive Gray-Wyner network, the optimal rate region of which is characterized. Some lower and upper bounds on the performance of optimal caching strategies are developed and shown to actually yield closed form solutions for certain classes of file contents.

show abstract

“…. , Y r ) subject to constraints on r, the size of the state space [Op't Veld and Gastpar, 2016a]. This optimization can be written equivalently as follows.…”

Section: Extracting Common Informationmentioning

confidence: 99%

“…correlations lead to stronger weights), but this objective strongly prefers correlations that are nearly maximal, in which case the denominator becomes small and the weight becomes large. This optimization of T C(X|Y ) for continuous random variables X and Y is, to the best of our knowledge, the first tractable approach except for a special case discussed by [Op't Veld and Gastpar, 2016a]. Also note that although we used Σ, Λ in the derivation, the solution does not require us to calculate these computationally intensive quantities.…”

Section: T C(x|ymentioning

confidence: 99%

Sifting Common Information from Many Variables

Steeg

Gao

Reing

et al. 2017

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence

View full text Add to dashboard Cite

Measuring the relationship between any pair of variables is a rich and active area of research that is central to scientific practice. In contrast, characterizing the common information among any group of variables is typically a theoretical exercise with few practical methods for high-dimensional data. A promising solution would be a multivariate generalization of the famous Wyner common information, but this approach relies on solving an apparently intractable optimization problem. We leverage the recently introduced information sieve decomposition to formulate an incremental version of the common information problem that admits a simple fixed point solution, fast convergence, and complexity that is linear in the number of variables. This scalable approach allows us to demonstrate the usefulness of common information in high-dimensional learning problems. The sieve outperforms standard methods on dimensionality reduction tasks, solves a blind source separation problem that cannot be solved with ICA, and accurately recovers structure in brain imaging data.

show abstract

Caching Gaussians: Minimizing total correlation on the Gray-Wyner network

Cited by 13 publications

References 9 publications

An Information-Theoretic Approach to Unsupervised Feature Selection for High-Dimensional Data

An Information-Theoretic Approach to Unsupervised Feature Selection for High-Dimensional Data

Successive Refinement to Caching for Dynamic Content

Sifting Common Information from Many Variables

Contact Info

Product

Resources

About