Daniel Ferizovic scite author profile

We present new sequential and parallel sorting algorithms that now represent the fastest known techniques for a wide range of input sizes, input distributions, data types, and machines. Somewhat surprisingly, part of the speed advantage is due to the additional feature of the algorithms to work in-place, i.e., they do not need a significant amount of space beyond the input array. Previously, the in-place feature often implied performance penalties. Our main algorithmic contribution is a blockwise approach to in-place data distribution that is provably cache-efficient. We also parallelize this approach taking dynamic load balancing and memory locality into account. Our new comparison-based algorithm In-place Parallel Super Scalar Samplesort ( IPS 4 o ) , combines this technique with branchless decision trees. By taking cases with many equal elements into account and by adapting the distribution degree dynamically, we obtain a highly robust algorithm that outperforms the best previous in-place parallel comparison-based sorting algorithms by almost a factor of three. That algorithm also outperforms the best comparison-based competitors regardless of whether we consider in-place or not in-place, parallel or sequential settings. Another surprising result is that IPS 4 o even outperforms the best (in-place or not in-place) integer sorting algorithms in a wide range of situations. In many of the remaining cases (often involving near-uniform input distributions, small keys, or a sequential setting), our new In-place Parallel Super Scalar Radix Sort ( IPS 2 Ra ) turns out to be the best algorithm. Claims to have the – in some sense – “best” sorting algorithm can be found in many papers which cannot all be true. Therefore, we base our conclusions on an extensive experimental study involving a large part of the cross product of 21 state-of-the-art sorting codes, 6 data types, 10 input distributions, 4 machines, 4 memory allocation strategies, and input sizes varying over 7 orders of magnitude. This confirms the claims made about the robust performance of our algorithms while revealing major performance problems in many competitors outside the concrete set of measurements reported in the associated publications. This is particularly true for integer sorting algorithms giving one reason to prefer comparison-based algorithms for robust general-purpose sorting.

show abstract

In-Place Parallel Super Scalar Samplesort (IPSSSSo)

Axtmann

Witt

Ferizovic³

et al. 2017

View full text Add to dashboard Cite

Engineering In-place (Shared-memory) Sorting Algorithms

Axtmann¹,

Witt²,

Ferizovic³

et al. 2020

Preprint

View full text Add to dashboard Cite

We present new sequential and parallel sorting algorithms that now represent the fastest known techniques for a wide range of input sizes, input distributions, data types, and machines. Somewhat surprisingly, part of the speed advantage is due to the additional feature of the algorithms to work in-place, i.e., they do not need a significant amount of space beyond the input array. Previously, the in-place feature often implied performance penalties. Our main algorithmic contribution is a blockwise approach to in-place data distribution that is provably cache-efficient. We also parallelize this approach taking dynamic load balancing and memory locality into account.Our new comparison-based algorithm In-place Superscalar Samplesort (IPS 4 o), combines this technique with branchless decision trees. By taking cases with many equal elements into account and by adapting the distribution degree dynamically, we obtain a highly robust algorithm that outperforms the best previous inplace parallel comparison-based sorting algorithms by almost a factor of three. That algorithm also outperforms the best comparison-based competitors regardless of whether we consider in-place or not in-place, parallel or sequential settings.Another surprising result is that IPS 4 o even outperforms the best (in-place or not in-place) integer sorting algorithms in a wide range of situations. In many of the remaining cases (often involving near-uniform input distributions, small keys, or a sequential setting), our new In-place Parallel Super Scalar Radix Sort (IPS 2 Ra) turns out to be the best algorithm.Claims to have the -in some sense -"best" sorting algorithm can be found in many papers which cannot all be true. Therefore, we base our conclusions on an extensive experimental study involving a large part of the cross product of 21 state-of-the-art sorting codes, 6 data types, 10 input distributions, 4 machines, 4 memory allocation strategies, and input sizes varying over 7 orders of magnitude. This confirms the claims made about the robust performance of our algorithms while revealing major performance problems in many competitors outside the concrete set of measurements reported in the associated publications. This is particularly true for integer sorting algorithms giving one reason to prefer comparison-based algorithms for robust general-purpose sorting.

show abstract

Pam

2018

View full text Add to dashboard Cite

Ordered (key-value) maps are an important and widely-used data type for large-scale data processing frameworks. Beyond simple search, insertion and deletion, more advanced operations such as range extraction, filtering, and bulk updates form a critical part of these frameworks. We describe an interface for ordered maps that is augmented to support fast range queries and sums, and introduce a parallel and concurrent library called PAM (Parallel Augmented Maps) that implements the interface. The interface includes a wide variety of functions on augmented maps ranging from basic insertion and deletion to more interesting functions such as union, intersection, filtering, extracting ranges, splitting, and range-sums. We describe algorithms for these functions that are efficient both in theory and practice. As examples of the use of the interface and the performance of PAM we apply the library to four applications: simple range sums, interval trees, 2D range trees, and ranked word index searching. The interface greatly simplifies the implementation of these data structures over direct implementations. Sequentially the code achieves performance that matches or exceeds existing libraries designed specially for a single application, and in parallel our implementation gets speedups ranging from 40 to 90 on 72 cores with 2-way hyperthreading.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.