Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management 2007
DOI: 10.1145/1321440.1321532
|View full text |Cite
|
Sign up to set email alerts
|

External perfect hashing for very large key sets

Abstract: A perfect hash function (PHF) h : S → [0, m − 1] for a key set S ⊆ U of size n, where m ≥ n and U is a key universe, is an injective function that maps the keys of S to unique values. A minimal perfect hash function (MPHF) is a PHF with m = n, the smallest possible range. Minimal perfect hash functions are widely used for memory efficient storage and fast retrieval of items from static sets.In this paper we present a distributed and parallel version of a simple, highly scalable and near-space optimal perfect h… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
49
0

Year Published

2009
2009
2017
2017

Publication Types

Select...
7
1

Relationship

2
6

Authors

Journals

citations
Cited by 42 publications
(49 citation statements)
references
References 28 publications
0
49
0
Order By: Relevance
“…Therefore, we do not use real random hash functions, but just use the heuristic hash function proposed by Jenkins [14]. This function presents very good performance in practice [2,3], and outputs a 12 byte long integer, which is an interesting property for our implemention described below.…”
Section: A a Practical Versionmentioning
confidence: 99%
“…Therefore, we do not use real random hash functions, but just use the heuristic hash function proposed by Jenkins [14]. This function presents very good performance in practice [2,3], and outputs a 12 byte long integer, which is an interesting property for our implemention described below.…”
Section: A a Practical Versionmentioning
confidence: 99%
“…Preliminary partial results of this paper appeared in [8,10]. In [8] we describe the RAM algorithm, but both the description and the analysis of the algorithm are sketchy and incomplete.…”
Section: Contributionsmentioning
confidence: 99%
“…In [8] we describe the RAM algorithm, but both the description and the analysis of the algorithm are sketchy and incomplete. In [10] we describe the EM algorithm. We present in this paper significant improvements and extensions on those results.…”
Section: Contributionsmentioning
confidence: 99%
“…MWHC functions require a large amount of memory to be built, as they require random access to the 3-hypergraph to perform a visit. To make their construction suitable for large-size key sets we reuse some techniques from [4]: we divide keys into chunks using a hash function, and build a separate MWHC function for each chunk. We must now store for each chunk the offset in the array a where the data relative to the chunk is written, but using a chunk size ω(log n) (say, log n log log n) the space is negligible.…”
Section: Storing Functionsmentioning
confidence: 99%
“…We must now store for each chunk the offset in the array a where the data relative to the chunk is written, but using a chunk size ω(log n) (say, log n log log n) the space is negligible. The careful analysis in [4] shows that this approach can be made to work even at a theoretical level by carefully reusing the random bits when building the MWHC functions of each chunk.…”
Section: Storing Functionsmentioning
confidence: 99%