2021
DOI: 10.1093/imaiai/iaab007
|View full text |Cite
|
Sign up to set email alerts
|

Distributed information-theoretic clustering

Abstract: We study a novel multi-terminal source coding setup motivated by the biclustering problem. Two separate encoders observe two i.i.d. sequences $X^n$ and $Y^n$, respectively. The goal is to find rate-limited encodings $f(x^n)$ and $g(z^n)$ that maximize the mutual information $\textrm{I}(\,{f(X^n)};{g(Y^n)})/n$. We discuss connections of this problem with hypothesis testing against independence, pattern recognition and the information bottleneck method. Improving previous cardinality bounds for the inner and out… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
5
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 30 publications
(5 citation statements)
references
References 34 publications
0
5
0
Order By: Relevance
“…Through powerful machine learning algorithms, we can dig out a lot of useful information from these data. In recent years, many different clustering algorithms have been proposed [2] . As one of the key technologies to deal with big data, they have been more and more widely used in digital image processing [3] , computer science [4][5] , species category analysis [6][7] , and other fields.…”
Section: Introductionmentioning
confidence: 99%
“…Through powerful machine learning algorithms, we can dig out a lot of useful information from these data. In recent years, many different clustering algorithms have been proposed [2] . As one of the key technologies to deal with big data, they have been more and more widely used in digital image processing [3] , computer science [4][5] , species category analysis [6][7] , and other fields.…”
Section: Introductionmentioning
confidence: 99%
“…In particular, let be a bivariate source characterized by a fixed joint probability law and consider all Markov chains . The Double-Sided Information Bottleneck (DSIB) function is defined as [ 2 ]: where the maximization is over all and satisfying and . This problem is illustrated in Figure 1 .…”
Section: Introductionmentioning
confidence: 99%
“…An optimization algorithm was presented that intertwines both row and column clustering at all stages. Distributed clustering from a proper information-theoretic perspective was first explicitly considered by Pichler et al [ 2 ]. Consider the model illustrated in Figure 3 .…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations