External perfect hashing for very large key sets

Botelho, Fabiano C.; Ziviani, Nívio

doi:10.1145/1321440.1321532

Cited by 42 publications

(49 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Therefore, we do not use real random hash functions, but just use the heuristic hash function proposed by Jenkins [14]. This function presents very good performance in practice [2,3], and outputs a 12 byte long integer, which is an interesting property for our implemention described below.…”

Section: A a Practical Versionmentioning

confidence: 99%

Hash, Displace, and Compress

Belazzougui

Botelho

Dietzfelbinger

2009

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

Abstract. A hash function h, i.e., a function from the set U of all keys to the range range [m] = {0, . . . , m − 1} is called a perfect hash function (PHF) for a subset S ⊆ U of size n ≤ m if h is 1-1 on S. The important performance parameters of a PHF are representation size, evaluation time and construction time. In this paper, we present an algorithm that permits to obtain PHFs with representation size very close to optimal while retaining O(n) construction time and O(1) evaluation time. For example in the case m = 2n we obtain a PHF that uses space 0.67 bits per key, and for m = 1.23n we obtain space 1.4 bits per key, which was not achievable with previously known methods. Our algorithm is inspired by several known algorithms; the main new feature is that we combine a modification of Pagh's "hash-and-displace" approach with data compression on a sequence of hash function indices. That combination makes it possible to significantly reduce space usage while retaining linear construction time and constant query time. Our algorithm can also be used for k-perfect hashing, where at most k keys may be mapped to the same value. For the analysis we assume that fully random hash functions are given for free; such assumptions can be justified and were made in previous papers.

show abstract

Section: A a Practical Versionmentioning

confidence: 99%

Hash, Displace, and Compress

Belazzougui

Botelho

Dietzfelbinger

2009

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

show abstract

“…Preliminary partial results of this paper appeared in [8,10]. In [8] we describe the RAM algorithm, but both the description and the analysis of the algorithm are sketchy and incomplete.…”

Section: Contributionsmentioning

confidence: 99%

“…In [8] we describe the RAM algorithm, but both the description and the analysis of the algorithm are sketchy and incomplete. In [10] we describe the EM algorithm. We present in this paper significant improvements and extensions on those results.…”

Section: Contributionsmentioning

confidence: 99%

Practical perfect hashing in nearly optimal space

Botelho

Pagh

Ziviani

2013

Information Systems

Self Cite

View full text Add to dashboard Cite

“…MWHC functions require a large amount of memory to be built, as they require random access to the 3-hypergraph to perform a visit. To make their construction suitable for large-size key sets we reuse some techniques from [4]: we divide keys into chunks using a hash function, and build a separate MWHC function for each chunk. We must now store for each chunk the offset in the array a where the data relative to the chunk is written, but using a chunk size ω(log n) (say, log n log log n) the space is negligible.…”

Section: Storing Functionsmentioning

confidence: 99%

“…We must now store for each chunk the offset in the array a where the data relative to the chunk is written, but using a chunk size ω(log n) (say, log n log log n) the space is negligible. The careful analysis in [4] shows that this approach can be made to work even at a theoretical level by carefully reusing the random bits when building the MWHC functions of each chunk.…”

Section: Storing Functionsmentioning

confidence: 99%

Theory and Practise of Monotone Minimal Perfect Hashing

Belazzougui

Boldi

Pagh

et al. 2009

2009 Proceedings of the Eleventh Workshop on Algorithm Engineering and Experiments (ALENEX)

View full text Add to dashboard Cite

Minimal perfect hash functions have been shown to be useful to compress data in several data management tasks. In particular, order-preserving minimal perfect hash functions [12] have been used to retrieve the position of a key in a given list of keys: however, the ability to preserve any given order leads to an unavoidable (n log n) lower bound on the number of bits required to store the function. Recently, it was observed [1] that very frequently the keys to be hashed are sorted in their intrinsic (i.e., lexicographical) order. This is typically the case of dictionaries of search engines, list of URLs of web graphs, etc. We refer to this restricted version of the problem as monotone minimal perfect hashing. We analyse experimentally the data structures proposed in [1], and along our way we propose some new methods that, albeit asymptotically equivalent or worse, perform very well in practise, and provide a balance between access speed, ease of construction, and space usage.

show abstract

External perfect hashing for very large key sets

Cited by 42 publications

References 28 publications

Hash, Displace, and Compress

Hash, Displace, and Compress

Practical perfect hashing in nearly optimal space

Theory and Practise of Monotone Minimal Perfect Hashing

Contact Info

Product

Resources

About