The 31st ACM Symposium on Parallelism in Algorithms and Architectures 2019
DOI: 10.1145/3323165.3323198
|View full text |Cite
|
Sign up to set email alerts
|

Theoretically-Efficient and Practical Parallel In-Place Radix Sorting

Abstract: Radix sort stands out for having a better worse case theoretical bound than any comparison-based sort, for fixed length integers. Despite the fact that radix sort can be implemented either in-place or in parallel, there exists no parallel in-place implementation for radix sort that guarantees a sub-linear worst case span. The challenge arises due to read-write races when reading from and writing to the same array. In this thesis, I introduce Regions sort and use it to implement a parallel work-efficient in-pla… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
25
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
3
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 19 publications
(25 citation statements)
references
References 41 publications
0
25
0
Order By: Relevance
“…Note that, in order to use Radix Sort with IEEE-754 floating point numbers, it is first necessary to shift and mask the bit representation. While Radix Sort is highly sensitive to the key length, which dictates the number of passes, it is nevertheless a very efficient sorting algorithm for numerical types, that is very well-suited for multi-core procedures [6,22,40], and SIMD vectorization [50].…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Note that, in order to use Radix Sort with IEEE-754 floating point numbers, it is first necessary to shift and mask the bit representation. While Radix Sort is highly sensitive to the key length, which dictates the number of passes, it is nevertheless a very efficient sorting algorithm for numerical types, that is very well-suited for multi-core procedures [6,22,40], and SIMD vectorization [50].…”
Section: Related Workmentioning
confidence: 99%
“…As baselines, we compare against cache-optimized and highly tuned C++ implementations of Radix Sort [51], Timsort [18], Introsort (std::sort), Histogram Sort [4], and IS 4 o [49] (one of the most optimized sorting algorithms we were able to find, which was also recently used in other studies [40] as a comparison point). Note that we use a recursive, equidepth version of Histogram sort that adapts to the input's skew as to avoid severe performance penalties.…”
Section: Setup and Datasetsmentioning
confidence: 99%
See 1 more Smart Citation
“…These results demonstrate that Vortex-S achieves a substantial improvement over the previous methods, while incurring negligible RAM overhead πœ–. On Skylake-X (i.e., 𝑐 3 ), it beats the fastest in-place methods [42], [48] by 3 βˆ’ 4Γ— and STL quicksort by 11Γ—. While Vortex-S is hands-down the fastest technique that can sort 24 GB of keys on these machines, it is interesting to see how its performance stacks up against the best outof-place methods.…”
Section: Sortingmentioning
confidence: 70%
“…Consequently, pertinent top-π‘˜ applications do not adopt priority queue-based top-π‘˜. Instead, they use sort-and-choose approach for top-π‘˜ computing on GPUs [6,18,33,44,48]. However, as shown in Figure 17, the GPU-based sort-and-choose top-π‘˜ [6] takes much longer time than GPU-based top-k algorithms.…”
mentioning
confidence: 99%