Compression Schemes for Similarity Queries

Ochoa, Idoia; Ingber, Amir; Weissman, Tsachy

doi:10.1109/dcc.2014.37

Cited by 4 publications

(6 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The TC-scheme is an improved version of the LC-scheme by optimizing jointly the quantization distortion and the expected query codeword distance. The results in [5] show that the compression rate of TC-can achieve the identification rate for the case with binary sources and the Hamming distance.…”

Section: Introductionmentioning

confidence: 92%

“…Proof. Given the stationary Gaussian source { Xn }, we can decompose the source into vectors X of M successive random variables and describe those vectors with a M th order multivariate Gaussian distribution (5). Then we can apply the KLT transform on the decomposed vectors X = A T M X, where A M is the eigenmatrix of the covariance matrix C M .…”

Section: Identification Rate Of Gaussian Sources With Memorymentioning

confidence: 99%

“…However, the optimal D ID -admissible system is difficult to achieve due to the triangle-inequality constraint that most distortion measures possess. The state-of-the-art practical schemes for the similarity identification problem are the triangle-inequality based TC-and LC-schemes proposed in [5], where the TC- scheme is consistently performs better than the LC-scheme. Therefore, we replace the ideal scheme with the practical TCscheme for each component.…”

Section: Component-based Model With Practical Schemesmentioning

confidence: 99%

“…Since it is common to encounter correlated data in the real world, it is of interest to investigate similarity identification schemes for correlated sources. [5] uses lossy compression as a building block to construct the TC-(Type Covering signatures and triangle-inequality decision rule) scheme and the LC-(Lossy Compression signatures and triangle-inequality decision rule) scheme. The LC-scheme only optimizes the quantization distortion and can be achieved by employing a rate-distortion code on the triangle-inequality principle.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Computing Similarity Queries for Correlated Gaussian Sources

Wu¹,

Wang²,

Flierl

2020

Preprint

View full text Add to dashboard Cite

Among many current data processing systems, the objectives are often not the reproduction of data, but to compute some answers based on the data resulting from queries. The similarity identification task is to identify the items in a database that are similar to a given query item for a given metric. The problem of compression for similarity identification has been studied in [1]. Unlike classical compression problems, the focus is not on reconstructing the original data. Instead, the compression rate is determined by the desired reliability of the answers. Specifically, the information measure identification rate characterizes the minimum rate that can be achieved among all schemes which guarantee reliable answers with respect to a given similarity threshold. In this paper, we propose a componentbased model for computing correlated similarity queries. The correlated signals are first decorrelated by the Karhunen-Loève transform (KLT). Then, the decorrelated signal is processed by a distinct D-admissible system for each component. We show that the component-based model equipped with KLT can perfectly represent the multivariate Gaussian similarity queries when optimal rate-similarity allocation applies. Hence, we can derive the identification rate of the multivariate Gaussian signals based on the component-based model. We then extend the result to general Gaussian sources with memory. We also study the models equipped with practical component systems. We use TC-schemes that use type covering signatures and triangleinequality decision rules [1] as our component systems. We propose an iterative method to numerically approximate the minimum achievable rate of the TC-scheme. We show that our component-based model equipped with TC-schemes can achieve better performance than the TC-scheme unaided on handling the multivariate Gaussian sources.

show abstract

Section: Introductionmentioning

confidence: 92%

Section: Identification Rate Of Gaussian Sources With Memorymentioning

confidence: 99%

Section: Component-based Model With Practical Schemesmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Computing Similarity Queries for Correlated Gaussian Sources

Wu¹,

Wang²,

Flierl

2020

Preprint

View full text Add to dashboard Cite

show abstract

“…the TC-△ and the LC-△ schemes are optimal. However, if X and Y are not equiprobable (and the distortion measure is still Hamming), the LC-△ scheme differs from the TC-△ scheme (see [35,Fig. 2]).…”

Section: Special Casesmentioning

confidence: 99%

The Minimal Compression Rate for Similarity Identification

Ingber¹,

Weissman²

2013

Preprint

Self Cite

View full text Add to dashboard Cite

Traditionally, data compression deals with the problem of concisely representing a data source, e.g. a sequence of letters, for the purpose of eventual reproduction (either exact or approximate). In this work we are interested in the case where the goal is to answer similarity queries about the compressed sequence, i.e. to identify whether or not the original sequence is similar to a given query sequence.We study the fundamental tradeoff between the compression rate and the reliability of the queries performed on compressed data. For i.i.d. sequences, we characterize the minimal compression rate that allows query answers, that are reliable in the sense of having a vanishing false-positive probability, when false negatives are not allowed. The result is partially based on a previous work by Ahlswede et al. [1], and the inherently typical subset lemma plays a key role in the converse proof.We then characterize the compression rate achievable by schemes that use lossy source codes as a building block, and show that such schemes are, in general, suboptimal. Finally, we tackle the problem of evaluating the minimal compression rate, by converting the problem to a sequence of convex programs that can be solved efficiently.

show abstract

Transform-based compression for quadratic similarity queries

Flierl

2017

2017 51st Asilomar Conference on Signals, Systems, and Computers

View full text Add to dashboard Cite

Compression Schemes for Similarity Queries

Cited by 4 publications

References 14 publications

Computing Similarity Queries for Correlated Gaussian Sources

Computing Similarity Queries for Correlated Gaussian Sources

The Minimal Compression Rate for Similarity Identification

Transform-based compression for quadratic similarity queries

Contact Info

Product

Resources

About