Introspective Sorting and Selection Algorithms

Musser, David R.

doi:10.1002/(sici)1097-024x(199708)27:8<983::aid-spe117>3.0.co;2-#

Cited by 182 publications

(145 citation statements)

References 9 publications

Supporting

Mentioning

141

Contrasting

Unclassified

Order By: Relevance

“…An adaptive fallback using a runtime check is a standard heuristic technique to avoid worst case performance in many algorithms. For example, introsort [19] used in the STL's std::sort library method uses quicksort with adaptive fallback to heapsort to avoid the O(N 2 ) worst-case performance of quicksort. From the results shown in Figure 10, for two input arrays with comparable sizes, we start execution with our SIMD algorithm using the block size setting of 4x4 (or 8x8 if we use STTNI on Xeon).…”

Section: Performance For Two Arrays Of Various Sizesmentioning

confidence: 99%

Faster set intersection with SIMD instructions by reducing branch mispredictions

2014

View full text Add to dashboard Cite

Set intersection is one of the most important operations for many applications such as Web search engines or database management systems. This paper describes our new algorithm to efficiently find set intersections with sorted arrays on modern processors with SIMD instructions and high branch misprediction penalties. Our algorithm efficiently exploits SIMD instructions and can drastically reduce branch mispredictions. Our algorithm extends a merge-based algorithm by reading multiple elements, instead of just one element, from each of two input arrays and compares all of the pairs of elements from the two arrays to find the elements with the same values. The key insight for our improvement is that we can reduce the number of costly hard-to-predict conditional branches by advancing a pointer by more than one element at a time. Although this algorithm increases the total number of comparisons, we can execute these comparisons more efficiently using the SIMD instructions and gain the benefits of the reduced branch misprediction overhead. Our algorithm is suitable to replace existing standard library functions, such as std::set_intersection in C++, thus accelerating many applications, because the algorithm is simple and requires no preprocessing to generate additional data structures. We implemented our algorithm on Xeon and POWER7+. The experimental results show our algorithm outperforms the std::set_intersection implementation delivered with gcc by up to 5.2x using SIMD instructions and by up to 2.1x even without using SIMD instructions for 32-bit and 64-bit integer datasets. Our SIMD algorithm also outperformed an existing algorithm that can leverage SIMD instructions.

show abstract

Section: Performance For Two Arrays Of Various Sizesmentioning

confidence: 99%

Faster set intersection with SIMD instructions by reducing branch mispredictions

2014

View full text Add to dashboard Cite

show abstract

“…For the disk-enabled Vectorwise this is achieved by executing the query several times and reporting only the execution times of runs after the data was fully resident in RAM. In order to cover the most important scenarios, we report benchmark results using datasets Core 2 (8,40) Core 4 (16,48) Core 6 (24,56) Core 1 (4,36) Core 3 (12,44) Core 5 (20,52) Core 7…”

Section: Experimental Evaluationmentioning

confidence: 99%

“…Section 5, Figure 11). We therefore instantiated 32 threads to work on one relation with a total of 1600M (throughout the paper we use M = 2 20 ) tuples, each consisting of a 64-bit sort key and a 64-bit payload, in parallel. (1) We first chunked the relation and sorted the chunks of 50M tuples each as runs in parallel.…”

Section: Introductionmentioning

confidence: 99%

Massively parallel sort-merge joins in main memory multi-core database systems

2012

View full text Add to dashboard Cite

Two emerging hardware trends will dominate the database system technology in the near future: increasing main memory capacities of several TB per server and massively parallel multi-core processing. Many algorithmic and control techniques in current database technology were devised for diskbased systems where I/O dominated the performance. In this work we take a new look at the well-known sort-merge join which, so far, has not been in the focus of research in scalable massively parallel multi-core data processing as it was deemed inferior to hash joins. We devise a suite of new massively parallel sort-merge (MPSM) join algorithms that are based on partial partition-based sorting. Contrary to classical sort-merge joins, our MPSM algorithms do not rely on a hard to parallelize final merge step to create one complete sort order. Rather they work on the independently created runs in parallel. This way our MPSM algorithms are NUMA-affine as all the sorting is carried out on local memory partitions. An extensive experimental evaluation on a modern 32-core machine with one TB of main memory proves the competitive performance of MPSM on large main memory databases with billions of objects. It scales (almost) linearly in the number of employed cores and clearly outperforms competing hash join proposals -in particular it outperforms the "cutting-edge" Vectorwise parallel query engine by a factor of four.

show abstract

“…Introsort is based on Quicksort, but switches to heap-sort when the recursion depth gets too large. Since it is highly dependent on the computer system and compiler used, we only included it to give a hint as to what could be gained by sorting on the GPU instead of on the CPU [19].…”

Section: Experimental Evaluationmentioning

confidence: 99%

A Practical Quicksort Algorithm for Graphics Processors

Cederman

Tsigas

2008

Algorithms - ESA 2008

121

View full text Add to dashboard Cite

Abstract. In this paper we present GPU-Quicksort, an efficient Quicksort algorithm suitable for highly parallel multi-core graphics processors. Quicksort has previously been considered as an inefficient sorting solution for graphics processors, but we show that GPU-Quicksort often performs better than the fastest known sorting implementations for graphics processors, such as radix and bitonic sort. Quicksort can thus be seen as a viable alternative for sorting large quantities of data on graphics processors.

show abstract

Introspective Sorting and Selection Algorithms

Cited by 182 publications

References 9 publications

Faster set intersection with SIMD instructions by reducing branch mispredictions

Faster set intersection with SIMD instructions by reducing branch mispredictions

Massively parallel sort-merge joins in main memory multi-core database systems

A Practical Quicksort Algorithm for Graphics Processors

Contact Info

Product

Resources

About