Optimal Data-Dependent Hashing for Approximate Near Neighbors

Andoni, Alexandr; Razenshteyn, Ilya

doi:10.1145/2746539.2746553

Cited by 178 publications

(203 citation statements)

References 30 publications

Supporting

Mentioning

200

Contrasting

Order By: Relevance

“…Although this work focuses on applying angular LSH to sieving, more generally this work could be considered the first to succeed in applying LSH to lattice algorithms. Various recent followup works have already further investigated the use of different LSH methods [7,8] and other nearest neighbor search methods [9,11,38] in the context of lattice sieving [11][12][13]30,37], and an open problem is whether other lattice algorithms (e.g. provable sieving algorithms, the Voronoi cell algorithm [39]) can benefit from related techniques as well.…”

Section: Introductionmentioning

confidence: 99%

Sieving for Shortest Vectors in Lattices Using Angular Locality-Sensitive Hashing

Laarhoven

2015

Lecture Notes in Computer Science

104

100

View full text Add to dashboard Cite

Abstract. By replacing the brute-force list search in sieving algorithms with Charikar's angular localitysensitive hashing (LSH) method, we get both theoretical and practical speedups for solving the shortest vector problem (SVP) on lattices. Combining angular LSH with a variant of Nguyen and Vidick's heuristic sieve algorithm, we obtain heuristic time and space complexities for solving SVP in dimension n of 2 0.3366n+o(n) and 2 0.2075n+o(n) respectively, while combining the same ideas with Micciancio and Voulgaris' GaussSieve algorithm leads to a practical algorithm with (conjectured) time and space complexities bounded by 2 0.3366n+o(n) , leading to the best complexities for solving SVP in high dimensions to date. Experiments show that in moderate dimensions the GaussSieve-based HashSieve algorithm already outperforms the GaussSieve, and the practical increase in the space complexity is smaller than the asymptotic bounds suggest, and can be further reduced with probing. Extrapolating to higher dimensions, we estimate that a fully optimized and parallelized implementation of the GaussSieve-based HashSieve algorithm might need a few core years to solve SVP in dimension 130 or even 140.

show abstract

Section: Introductionmentioning

confidence: 99%

Sieving for Shortest Vectors in Lattices Using Angular Locality-Sensitive Hashing

Laarhoven

2015

Lecture Notes in Computer Science

104

100

View full text Add to dashboard Cite

show abstract

“…Comparing this result to Andoni et al's spherical hash functions h ∈ S [7,8] used in the SphereSieve [33], which have a collision probability of…”

mentioning

confidence: 70%

“…The main difference with previous work [32,33] lies in the choice of the hash function family, which in this paper is the efficient and asymptotically superior cross-polytope LSH, rather than the asymptotically worse angular or hyperplane LSH [13,32] or the less practical spherical LSH [8,33]. This leads to the CPSieve algorithm described in Algorithm 2.…”

Section: Cpsieve: Sieving In Arbitrary Latticesmentioning

confidence: 97%

Efficient (Ideal) Lattice Sieving Using Cross-Polytope LSH

Becker

Laarhoven

2016

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. Combining the efficient cross-polytope locality-sensitive hash family of Terasawa and Tanaka with the heuristic lattice sieve algorithm of Micciancio and Voulgaris, we show how to obtain heuristic and practical speedups for solving the shortest vector problem (SVP) on both arbitrary and ideal lattices. In both cases, the asymptotic time complexity for solving SVP in dimension n is 2 0.298n+o(n) . For any lattice, hashes can be computed in polynomial time, which makes our CPSieve algorithm much more practical than the SphereSieve of Laarhoven and De Weger, while the better asymptotic complexities imply that this algorithm will outperform the GaussSieve of Micciancio and Voulgaris and the HashSieve of Laarhoven in moderate dimensions as well. We performed tests to show this improvement in practice. For ideal lattices, by observing that the hash of a shifted vector is a shift of the hash value of the original vector and constructing rerandomization matrices which preserve this property, we obtain not only a linear decrease in the space complexity, but also a linear speedup of the overall algorithm. We demonstrate the practicability of our cross-polytope ideal lattice sieve IdealCPSieve by applying the algorithm to cyclotomic ideal lattices from the ideal SVP challenge and to lattices which appear in the cryptanalysis of NTRU.

show abstract

“…On the other hand, when the table size becomes closer to be linear of n, data structures such as locality-sensitive hashing (LSH) [2,12] or data-dependent LSH [3,4] achieve a cell-probe complexity ofÕ(dn ρ ) with data structures of sizeÕ(n 1+ρ ) for some 0 < ρ < 1 depending on the metric and the approximation ratio. Compared to the Θ log log d log log log d bound of Chakrabarti and Regev, thẽ O(dn ρ ) cell-probe complexity is much worse.…”

mentioning

confidence: 99%

“…This makes all cell-probes in LSH parallelizable into one round of parallel memory accesses. And the more recent data-dependent LSH [3,4] surpasses the classic LSH in cell-probe complexity by being a little more adaptive: the algorithm retrieves a data-dependent hash function before making the second round of cell-probes, while the cell-probes in the second round are independent of each other. In contrast, the algorithm of Chakrabarti and Regev [10] is fully adaptive: Every cell-probe must wait for the information retrieved by the previous cell-probe to proceed.…”

mentioning

confidence: 99%

Randomized Approximate Nearest Neighbor Search with Limited Adaptivity

Liu

Pan

Yin

2016

Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures

View full text Add to dashboard Cite

We study the fundamental problem of approximate nearest neighbor search in d-dimensional Hamming space {0, 1}d . We study the complexity of the problem in the famous cell-probe model, a classic model for data structures. We consider algorithms in the cell-probe model with limited adaptivity, where the algorithm makes k rounds of parallel accesses to the data structure for a given k. For any k ≥ 1, we give a simple randomized algorithm solving the approximate nearest neighbor search using k rounds of parallel memory accesses, with O(k(log d) 1/k ) accesses in total. We also give a more sophisticated randomized algorithm using O(k + (O(1/k) ) memory accesses in k rounds for large enough k. Both algorithms use data structures of size polynomial in n, the number of points in the database.For the lower bound, we prove an Ω(lower bound for the total number of memory accesses required by any randomized algorithm solving the approximate nearest neighbor search within k ≤ log log d 2 log log log d rounds of parallel memory accesses on any data structures of polynomial size. This lower bound shows that our first algorithm is asymptotically optimal for any constant round k. And our second algorithm approaches the asymptotically optimal tradeoff between rounds and memory accesses, in a sense that the lower bound of memory accesses for any k 1 rounds can be matched by the algorithm within k 2 = O(k 1 ) rounds. In the extreme, for some large enough k = Θ log log d log log log d , our second algorithm matches the Θ log log d log log log d tight bound for fully adaptive algorithms for approximate nearest neighbor search due to Chakrabarti and Regev [10]. IntroductionNearest neighbor search is a fundamental theoretical problem in Computer Science, with enormously many applications in diverse fields. In the nearest neighbor search problem, we are given a database B of n points from a metric space X. The goal is to preprocess them into a data structure, such that given any query point x ∈ X, an algorithm with accessing to the data structure can find a database point in B that is closest to the query point x among all database points. An extensively studied case is when the metric space is the Hamming space X = {0, 1} d .It is conjectured that the nearest neighbor search is hard to solve by any data structures when the dimension d is high (e.g. d ≫ log n). This conjecture is sometimes referred as a case of the "curse of dimensionality" and is one of the central problems in the area of data structure lower

show abstract

Optimal Data-Dependent Hashing for Approximate Near Neighbors

Cited by 178 publications

References 30 publications

Sieving for Shortest Vectors in Lattices Using Angular Locality-Sensitive Hashing

Sieving for Shortest Vectors in Lattices Using Angular Locality-Sensitive Hashing

Efficient (Ideal) Lattice Sieving Using Cross-Polytope LSH

Randomized Approximate Nearest Neighbor Search with Limited Adaptivity

Contact Info

Product

Resources

About