Spectral Approaches to Nearest Neighbor Search

Abdullah, Amirali; Andoni, Alexandr; Kannan, Ravindran; Krauthgamer, Robert

doi:10.1109/focs.2014.68

Cited by 15 publications

(15 citation statements)

References 47 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We stress that this approach gives improvement for worstcase datasets, which is somewhat unexpected. To put this into a perspective: if one were to assume that the dataset has some special structure, it would be more natural to expect speed-ups with data-dependent hashing: such hashing may adapt to the special structure, perhaps implicitly, as was done in, say, [9,29,1]. However, in our setting there is no assumed structure to adapt to, and hence it is unclear why data-dependent hashing shall help.…”

Section: Introductionmentioning

confidence: 92%

“…First, we show how to achieve success probability n −ρ , query time n oc(1) , and space and preprocessing time n 1+oc(1) , where ρ = 1 2c 2 −1 + oc (1). Finally, to obtain the final result, one then builds O n ρ copies of the above data structure to amplify the probability of success to 0.99 (as explained in Remark 2).…”

Section: The Data Structurementioning

confidence: 97%

“…The most interesting part of the analysis is lower bounding the probability of success: we need to show that it is at least n −ρ−oc (1) , where ρ = 1 2c 2 −1 . The challenge is that we need to analyze a (somewhat adaptive) random process.…”

Section: Overview Of the Analysismentioning

confidence: 99%

“…Suppose that there is a (r, cr, p1, p2)-sensitive partition R of R d , where (p1, p2) ∈ (0, 1) and let ρ = ln(1/p1)/ ln(1/p2). Assume that p1, p2 ≥ 1/n oc(1) , one can sample a partition from R in time n oc(1) , store it in space n oc(1) and perform point location in time n oc (1) . Then there exists a data structure for (c, r)-ANN over a set P ⊆ R d with |P | = n with preprocessing time O(dn 1+oc(1) ), probability of success at least n −ρ−oc(1) , space consumption (in addition to the data points) O(n 1+oc(1) ) and expected query time O(dn oc(1) ).…”

Section: Preliminariesmentioning

confidence: 99%

See 3 more Smart Citations

Optimal Data-Dependent Hashing for Approximate Near Neighbors

Andoni

Razenshteyn

2015

Proceedings of the Forty-Seventh Annual ACM Symposium on Theory of Computing

Self Cite

178

199

View full text Add to dashboard Cite

We show an optimal data-dependent hashing scheme for the approximate near neighbor problem. For an n-point dataset in a d-dimensional space our data structure achieves query time O(d · n ρ+o(1) ) and space O(n 1+ρ+o(1) + d · n), where ρ = 1 2c 2 −1 for the Euclidean space and approximation c > 1. For the Hamming space, we obtain an exponent of ρ = 1 2c−1 . Our result completes the direction set forth in [5] who gave a proof-of-concept that data-dependent hashing can outperform classic Locality Sensitive Hashing (LSH). In contrast to [5], the new bound is not only optimal, but in fact improves over the best (optimal) LSH data structures [15,3] for all approximation factors c > 1.From the technical perspective, we proceed by decomposing an arbitrary dataset into several subsets that are, in a certain sense, pseudo-random.

show abstract

Section: Introductionmentioning

confidence: 92%

Section: The Data Structurementioning

confidence: 97%

Section: Overview Of the Analysismentioning

confidence: 99%

Section: Preliminariesmentioning

confidence: 99%

See 2 more Smart Citations

Optimal Data-Dependent Hashing for Approximate Near Neighbors

Andoni

Razenshteyn

2015

Proceedings of the Forty-Seventh Annual ACM Symposium on Theory of Computing

Self Cite

178

199

View full text Add to dashboard Cite

show abstract

“…Most such examples include the algorithms that assume some additional structure in the dataset: such as some notion of low intrinsic dimension [KR02, CNBM01, KL04, BKL06, IN07, Cla06, DF08], or low dimensional data-set with high-dimensional noise [AAKK14]. Most relevant to us is the work of [DS13], which, while mostly focusing on the low intrinsic dimensional datasets, give a generic bound for the worst-case datasets as well.…”

Section: Related Workmentioning

confidence: 99%

LSH Forest: Practical Algorithms Made Theoretical

Andoni¹,

Razenshteyn²,

Nosatzki³

2017

Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms

Self Cite

View full text Add to dashboard Cite

We analyze LSH Forest [BCG05]-a popular heuristic for the nearest neighbor search-and show that a careful yet simple modification of it outperforms "vanilla" LSH algorithms. The end result is the first instance of a simple, practical algorithm that provably leverages data-dependent hashing to improve upon data-oblivious LSH.Here is the entire algorithm for the d-dimensional Hamming space. The LSH Forest, for a given dataset, applies a random permutation to all the d coordinates, and builds a trie on the resulting strings. In our modification, we further augment this trie: for each node, we store a constant number of points close to the mean of the corresponding subset of the dataset, which are compared to any query point reaching that node. The overall data structure is simply several such tries sampled independently.While the new algorithm does not quantitatively improve upon the best data-dependent hashing algorithms from [AR15] (which are known to be optimal), it is significantly simpler, being based on a practical heuristic, and is provably better than the best LSH algorithm for the Hamming space [IM98,HIM12].

show abstract

Fast spectral analysis for approximate nearest neighbor search

Wang

Shen

2022

Mach Learn

View full text Add to dashboard Cite

Spectral Approaches to Nearest Neighbor Search

Cited by 15 publications

References 47 publications

Optimal Data-Dependent Hashing for Approximate Near Neighbors

Optimal Data-Dependent Hashing for Approximate Near Neighbors

LSH Forest: Practical Algorithms Made Theoretical

Fast spectral analysis for approximate nearest neighbor search

Contact Info

Product

Resources

About