2008
DOI: 10.1016/j.jpdc.2008.05.012
|View full text |Cite
|
Sign up to set email alerts
|

Fast parallel GPU-sorting using a hybrid algorithm

Abstract: Abstract-This paper presents an algorithm for fast sorting of large lists using modern GPUs. The method achieves high speed by efficiently utilizing the parallelism of the GPU throughout the whole algorithm. Initially, a parallel bucketsort splits the list into enough sublists then to be sorted in parallel using merge-sort. The parallel bucketsort, implemented in NVIDIA's CUDA, utilizes the synchronization mechanisms, such as atomic increment, that is available on modern GPUs. The mergesort requires scattered … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
89
0
2

Year Published

2008
2008
2018
2018

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 128 publications
(91 citation statements)
references
References 8 publications
0
89
0
2
Order By: Relevance
“…Furthermore, GPUABiSort [9] was proposed, that is based on adaptive bitonic sort [2] and rearranges the data using bitonic trees to reduce the number of comparisons. Recently added GPU capabilities like scattered writes, flexible comparisons and atomic operations on memory have enabled methods combining radixsort and mergesort to achieve faster performances on modern GPUs [19,21,22].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Furthermore, GPUABiSort [9] was proposed, that is based on adaptive bitonic sort [2] and rearranges the data using bitonic trees to reduce the number of comparisons. Recently added GPU capabilities like scattered writes, flexible comparisons and atomic operations on memory have enabled methods combining radixsort and mergesort to achieve faster performances on modern GPUs [19,21,22].…”
Section: Related Workmentioning
confidence: 99%
“…Table 2 lists the best reported execution times 4 (in seconds) for varying dataset sizes on various architectures( IBM Cell [7], Nvidia 8600 GTS [22], Nvidia 8800 GTX and Quadro FX 5600 [19], Intel 2-core Xeon with Quicksort [7], and IBM PowerPC 970MP [11]). Our performance numbers are faster than those reported on other architectures.…”
Section: Comparison With Analytical Modelmentioning
confidence: 99%
“…Global Radix Uses radix sort on the entire sequence [1]. Hybridsort Uses a bucket sort followed by a merge sort [15]. STL-Introsort This is the Introsort implementation found in the C++ Standard Library.…”
Section: Experimental Evaluationmentioning
confidence: 99%
“…Sengupta et al [1] have presented a radix-sort and a Quicksort implementation. Recently, Sintorn et al [15] presented a sorting algorithm that combines bucket sort with merge sort.…”
Section: Introductionmentioning
confidence: 99%
“…Prucell et al [10] have presented an implementation of bitonic merge sort on GPU based on an implementation by kapasi et al [11]. Sintorn et al [12] presented a hybrid sorting algorithm which splits the data with a bucket sort and then uses merge sort on the resulting blocks.…”
Section: Related Workmentioning
confidence: 99%