Faster 64-bit universal hashing using carry-less multiplications

Lemire, Daniel; Kaser, Owen

doi:10.1007/s13389-015-0110-5

Cited by 22 publications

(15 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Further, we have shown that an implementation of these non-cryptographic hash functions offered competitive speed (as fast as MurmurHash), and were substantially faster than the implementations found in C++ standard libraries. Our approach is similar to previous work on fast universal hash families (e.g., MMH [32], CLHASH [41], UMAC [27], VHASH [30], and Poly1305 [15]), except that we get good regularity in addition to the high speed and universality. To promote the use of our hash functions among practitioners and researchers, our implementation is freely available as open source software.…”

Section: Resultsmentioning

confidence: 79%

Regular and almost universal hashing: an efficient implementation

2016

Self Cite

View full text Add to dashboard Cite

Random hashing can provide guarantees regarding the performance of data structures such as hash tables---even in an adversarial setting. Many existing families of hash functions are universal: given two data objects, the probability that they have the same hash value is low given that we pick hash functions at random. However, universality fails to ensure that all hash functions are well behaved. We further require regularity: when picking data objects at random they should have a low probability of having the same hash value, for any fixed hash function. We present the efficient implementation of a family of non-cryptographic hash functions (PM+) offering good running times, good memory usage as well as distinguishing theoretical guarantees: almost universality and component-wise regularity. On a variety of platforms, our implementations are comparable to the state of the art in performance. On recent Intel processors, PM+ achieves a speed of 4.7 bytes per cycle for 32-bit outputs and 3.3 bytes per cycle for 64-bit outputs. We review vectorization through SIMD instructions (e.g., AVX2) and optimizations for superscalar execution.Comment: accepted for publication in Software: Practice and Experience in September 201

show abstract

Section: Resultsmentioning

confidence: 79%

Regular and almost universal hashing: an efficient implementation

2016

Self Cite

View full text Add to dashboard Cite

show abstract

“…We can compute such a prefix sum in C++ with a loop that repeatedly apply the bitwise XOR on a leftshifted word:for (i=0;i<64;i++){mask = mask xor (mask << 1)}. This prefix sum can be more efficiently implemented as one instruction by using the carry-less multiplication [16] (implemented with the pclmulqdq instruction) of our unescaped quote bit vector by another 64-bit word made entirely of ones. The carry-less multiplication works like the regular integer multiplication, but, as the name suggests, without a carry because it relies on the XOR operation instead of the addition.…”

Section: Identification Of the Quoted Substringsmentioning

confidence: 99%

Parsing gigabytes of JSON per second

Langdale¹,

Lemire

2019

The VLDB Journal

Self Cite

View full text Add to dashboard Cite

JavaScript Object Notation or JSON is a ubiquitous data exchange format on the Web. Ingesting JSON documents can become a performance bottleneck due to the sheer volume of data. We are thus motivated to make JSON parsing as fast as possible.Despite the maturity of the problem of JSON parsing, we show that substantial speedups are possible. We present the first standard-compliant JSON parser to process gigabytes of data per second on a single core, using commodity processors. We can use a quarter or fewer instructions than a state-of-the-art reference parser like RapidJSON. Unlike other validating parsers, our software (simdjson) makes extensive use of Single Instruction, Multiple Data (SIMD) instructions. To ensure reproducibility, simdjson is freely available as open-source software under a liberal license.

show abstract

“…The overall number of cells is 131072 also in this case. The number of bits for the fingerprint a is set to 8,12 and 16 so that the standard cuckoo filter with d = 4, c = 1 gets a false positive rate of approximately 1.5%, 0.1%, and 0.006% at 95% occupancy. The same number of bits per cell is used for the different ACF configurations.…”

Section: Simulations With Generated Queriesmentioning

confidence: 99%

Adaptive Cuckoo Filters

Mitzenmacher

Pontarelli

Reviriego³

2018

2018 Proceedings of the Twentieth Workshop on Algorithm Engineering and Experiments (ALENEX)

View full text Add to dashboard Cite

We introduce the adaptive cuckoo filter (ACF), a data structure for approximate set membership that extends cuckoo filters by reacting to false positives, removing them for future queries. As an example application, in packet processing queries may correspond to flow identifiers, so a search for an element is likely to be followed by repeated searches for that element. Removing false positives can therefore significantly lower the false positive rate. The ACF, like the cuckoo filter, uses a cuckoo hash table to store fingerprints. We allow fingerprint entries to be changed in response to a false positive in a manner designed to minimize the effect on the performance of the filter. We show that the ACF is able to significantly reduce the false positive rate by presenting both a theoretical model for the false positive rate and simulations using both synthetic data sets and real packet traces.

show abstract

Faster 64-bit universal hashing using carry-less multiplications

Cited by 22 publications

References 23 publications

Regular and almost universal hashing: an efficient implementation

Regular and almost universal hashing: an efficient implementation

Parsing gigabytes of JSON per second

Adaptive Cuckoo Filters

Contact Info

Product

Resources

About