Abstract. A perfect hash function (PHF) h : U → [0, m − 1] for a key set S is a function that maps the keys of S to unique values. The minimum amount of space to represent a PHF for a given set S is known to be approximately 1.44n2 /m bits, where n = |S|. In this paper we present new algorithms for construction and evaluation of PHFs of a given set (for m = n and m = 1.23n), with the following properties:1. Evaluation of a PHF requires constant time.2. The algorithms are simple to describe and implement, and run in linear time. 3. The amount of space needed to represent the PHFs is around a factor 2 from the information theoretical minimum. No previously known algorithm has these properties. To our knowledge, any algorithm in the literature with the third property either:-Requires exponential time for construction and evaluation, or -Uses near-optimal space only asymptotically, for extremely large n. Thus, our main contribution is a scheme that gives low space usage for realistic values of n. The main technical ingredient is a new way of basing PHFs on random hypergraphs. Previously, this approach has been used to design simple PHFs with superlinear space usage 3 .⋆ This work was supported in part by GERINDO Project-grant MCT/CNPq/CT-INFO 552.087/02-5, and CNPq Grants 30.5237/02-0 (Nivio Ziviani) and 142786/2006-3 (Fabiano C. Botelho) 3 This version of the paper is identical to the one published in the WADS 2007 proceedings. Unfortunately, it does not give reference and credit to: (i) Chazelle et al.[5], where it is presented a way of constructing PHFs that is equivalent to the ones presented in this paper. It is explained as a modification of the "Bloomier Filter" data structure, but it is not explicit that a PHF is constructed. We have independently designed an algorithm to construct a PHF that maps keys from a key set S of size n to the range [0, (2.0 + ǫ)n − 1] based on random 2-graphs, where ǫ > 0. The resulting functions require 2.0 + ǫ bits per key to be stored. And to (ii) Belazzougui [3], who suggested a method to construct PHFs that map to the range [0, (1.23 + ǫ)n − 1] based on random 3-graphs. The resulting functions are stored in 2.46 bits per key and this space usage was further improved to 1.95 bits per key by using arithmetic coding. Thus, the simple construction of a PHF described must be attributed to Chazelle et al. The new contribution of this paper is to analyze and optimize the constant of the space usage considering implementation aspects as well as a way of constructing MPHFs from those PHFs.
Abstract. A hash function h, i.e., a function from the set U of all keys to the range range [m] = {0, . . . , m − 1} is called a perfect hash function (PHF) for a subset S ⊆ U of size n ≤ m if h is 1-1 on S. The important performance parameters of a PHF are representation size, evaluation time and construction time. In this paper, we present an algorithm that permits to obtain PHFs with representation size very close to optimal while retaining O(n) construction time and O(1) evaluation time. For example in the case m = 2n we obtain a PHF that uses space 0.67 bits per key, and for m = 1.23n we obtain space 1.4 bits per key, which was not achievable with previously known methods. Our algorithm is inspired by several known algorithms; the main new feature is that we combine a modification of Pagh's "hash-and-displace" approach with data compression on a sequence of hash function indices. That combination makes it possible to significantly reduce space usage while retaining linear construction time and constant query time. Our algorithm can also be used for k-perfect hashing, where at most k keys may be mapped to the same value. For the analysis we assume that fully random hash functions are given for free; such assumptions can be justified and were made in previous papers.
A perfect hash function (PHF) h : S → [0, m − 1] for a key set S ⊆ U of size n, where m ≥ n and U is a key universe, is an injective function that maps the keys of S to unique values. A minimal perfect hash function (MPHF) is a PHF with m = n, the smallest possible range. Minimal perfect hash functions are widely used for memory efficient storage and fast retrieval of items from static sets.In this paper we present a distributed and parallel version of a simple, highly scalable and near-space optimal perfect hashing algorithm for very large key sets, recently presented in [4]. The sequential implementation of the algorithm constructs a MPHF for a set of 1.024 billion URLs of average length 64 bytes collected from the Web in approximately 50 minutes using a commodity PC.The parallel implementation proposed here presents the following performance using 14 commodity PCs: (i) it constructs a MPHF for the same set of 1.024 billion URLs in approximately 4 minutes; (ii) it constructs a MPHF for a set of 14.336 billion 16-byte random integers in approximately 50 minutes with a performance degradation of 20%; (iii) one version of the parallel algorithm distributes the description of the MPHF among the participating machines and its evaluation is done in a distributed way, faster than the centralized function.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.