2012
DOI: 10.1142/s0129626412500089
|View full text |Cite
|
Sign up to set email alerts
|

Deterministic Sample Sort for Gpus

Abstract: We present and evaluate GPU Bucket Sort, a parallel deterministic sample sort algorithm for many-core GPUs. Our method is considerably faster than Thrust Merge (Satish et.al., Proc. IPDPS 2009), the best comparison-based sorting algorithm for GPUs, and it is as fast as the new randomized sample sort for GPUs by Leischner et.al. (to appear in Proc. IPDPS 2010). Our deterministic sample sort has the advantage that bucket sizes are guaranteed and therefore its running time does not have the input data dependent … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

1
27
0

Year Published

2012
2012
2023
2023

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 18 publications
(28 citation statements)
references
References 9 publications
1
27
0
Order By: Relevance
“…The algorithm of [34] was used by [8] in the context of integer sorting; initial local sorting involved radix-sort. Subsequently [9,10] utilized this approach for GPU sorting of arbitrary (not necessarily integer) keys. Similarly to the approach of [13,15] but differently, a sample of size ps is being used rather than the p(p − 1)s of [13,15].…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…The algorithm of [34] was used by [8] in the context of integer sorting; initial local sorting involved radix-sort. Subsequently [9,10] utilized this approach for GPU sorting of arbitrary (not necessarily integer) keys. Similarly to the approach of [13,15] but differently, a sample of size ps is being used rather than the p(p − 1)s of [13,15].…”
Section: Related Workmentioning
confidence: 99%
“…We can still draw however some reliable conclusions and reason about the performance of these implementations using the MBSP model, thus making MBSP useful and usable.Integer sorting on multicores and GPUs can be realized by traditional distribution-specific algorithms such as radix-sort [3,12,25,28], or variants of it that use fewer rounds of the baseline count-sort implementation provided additional information about key values is available [6,39].Other approaches include algorithms that use specialized hardware or software features of a particular multicore architecture [4,6,22,25]. Comparison-based algorithms have also been used with some obvious tweaks: use of deterministic regular sampling sorting [34] that utilizes serial radix-sort for local sorting [8,9,10] or use other methods for local sorting [38,3,5,6,22]. Network-based algorithms such as Batcher's [1] bitonic sorting [23,3,30,31,5] have also been utilized.…”
mentioning
confidence: 99%
See 1 more Smart Citation
“…Sample sorts generalize quicksorts; rather than splitting the input in 2 or 3 parts as in quicksort, they choose representative or random splitter elements to divide the input elements into many buckets, typically computing histograms to derive element offsets, then sort each bucket independently. Leischner et al [12] use random splitters and Dehne and Zaboli [8] deterministic splitters in their GPU implementations. Merge sorts are O(n log n) comparison sorts that recursively merge multiple sorted subsequences into a single sorted sequence.…”
Section: Related Workmentioning
confidence: 99%
“…ison sorts on the GPU include a bitonic sort by Peters et al [16], a bitonic-based merge sort (named Warpsort) by Ye et al [26] a Quicksort by Cederman and Tsigas [5] and sample sorts by Leischner et al [12] and Dehne and Zaboli [8].…”
Section: Introductionmentioning
confidence: 99%