CUDA‐quicksort: an improved GPU‐based implementation of quicksort

Manca, Elias; Manconi, Andrea; Orro, Alessandro; Armano, Giuliano; Milanesi, Luciano

doi:10.1002/cpe.3611

Cited by 21 publications

(14 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Clustering is performed by sorting the prefixes of the read sequences with our GPU-based CUDA-Quicksort [25]. As CUDA-Quicksort sorts numerical values, it is necessary to encode the prefixes of the read sequences.…”

Section: Methodsmentioning

confidence: 99%

“…The comparative assessment has been made in the task of sorting items with long keys -characterized by 19 digits (i.e., the maximum number of digits used to represent the encoded read prefixes). Experiments, performed ensuring a uniform distribution on benchmark datasets (with varying size from 1M to 32M elements), show that CUDA-Quicksort outperforms Thrust Radix Sort with a speed-up ranging from 1.58x to 2.18x, depending on the dataset at hand [25]. …”

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Removing duplicate reads using graphics processing units

et al. 2016

Self Cite

View full text Add to dashboard Cite

BackgroundDuring library construction polymerase chain reaction is used to enrich the DNA before sequencing. Typically, this process generates duplicate read sequences. Removal of these artifacts is mandatory, as they can affect the correct interpretation of data in several analyses. Ideally, duplicate reads should be characterized by identical nucleotide sequences. However, due to sequencing errors, duplicates may also be nearly-identical. Removing nearly-identical duplicates can result in a notable computational effort. To deal with this challenge, we recently proposed a GPU method aimed at removing identical and nearly-identical duplicates generated with an Illumina platform.The method implements an approach based on prefix-suffix comparison. Read sequences with identical prefix are considered potential duplicates. Then, their suffixes are compared to identify and remove those that are actually duplicated.Although the method can be efficiently used to remove duplicates, there are some limitations that need to be overcome. In particular, it cannot to detect potential duplicates in the event that prefixes are longer than 27 bases, and it does not provide support for paired-end read libraries. Moreover, large clusters of potential duplicates are split into smaller with the aim to guarantees a reasonable computing time. This heuristic may affect the accuracy of the analysis.ResultsIn this work we propose GPU-DupRemoval, a new implementation of our method able to (i) cluster reads without constraints on the maximum length of the prefixes, (ii) support both single- and paired-end read libraries, and (iii) analyze large clusters of potential duplicates.ConclusionsDue to the massive parallelization obtained by exploiting graphics cards, GPU-DupRemoval removes duplicate reads faster than other cutting-edge solutions, while outperforming most of them in terms of amount of duplicates reads.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

Removing duplicate reads using graphics processing units

et al. 2016

Self Cite

View full text Add to dashboard Cite

show abstract

“…Quicksort [23] is based on a partitioning operation: Firstly, this algorithm divides a large array into two short sub-arrays: the lower elements and the higher elements. It is divided into different steps: 1.…”

Section: Quicksortmentioning

confidence: 99%

A Comparative Study of Sorting Algorithms with FPGA Acceleration by High Level Synthesis

Jmaa¹,

Atitallah²,

Duvivier³

et al. 2019

CyS

View full text Add to dashboard Cite

Nowadays, sorting is an important operation for several real-time embedded applications. It is one of the most commonly studied problems in computer science. It can be considered as an advantage for some applications such as avionic systems and decision support systems because these applications need a sorting algorithm for their implementation. However, sorting a big number of elements and/or real-time decision making need high processing speed. Therefore, accelerating sorting algorithms using FPGA can be an attractive solution. In this paper, we propose an efficient hardware implementation for different sorting algorithms (BubbleSort, InsertionSort, SelectionSort, QuickSort, HeapSort, ShellSort, MergeSort and TimSort) from high-level descriptions in the zynq-7000 platform. In addition, we compare the performance of different algorithms in terms of execution time, standard deviation and resource utilization. From the experimental results, we show that the SelectionSort is 1.01-1.23 times faster than other algorithms when N < 64; Otherwise, TimSort is the best algorithm.

show abstract

“…Manca et al . propose CUDA‐quicksort, an iterative GPU‐based implementation of the sorting algorithm. The quicksort that they propose is based on the GPU‐quicksort implementation, wherein the process has two major steps.…”

Section: Parallel Multi‐key Quicksortmentioning

confidence: 99%

“…When the size of input is very small and when no more quicksort can be applied, then a bi-tonic sort [31] is applied to get the final result. Manca et al [29] propose CUDA-quicksort, an iterative GPU-based implementation of the sorting algorithm. The quicksort that they propose is based on the GPU-quicksort implementation, wherein the process has two major steps.…”

Section: Cpu Parallel Multi-key Quicksortmentioning

confidence: 99%

Kepler GPU accelerated recursive sorting using dynamic parallelism

Neelima¹,

Shamsundar²,

Narayan

et al. 2016

Concurrency and Computation

View full text Add to dashboard Cite

Summary This paper focuses on the performance gain obtained on the Kepler graphics processing units (GPUs) for multi‐key quicksort. Because multi‐key quicksort is a recursive‐based algorithm, many of the researchers have found it tedious to parallelize the algorithm on the multi and many core architectures. A survey of the state‐of‐the‐art string sorting algorithms and a robust insight of the Kepler GPU architecture gave rise to an intriguing research idea of matching the template of multi‐key quicksort with the dynamic parallelism feature offered by the Kepler‐based GPU's. The CPU parallel implementation has an improvement of 33 to 50% and 62 to 75 improvement when compared with 8‐bit and 16‐bit parallel most significant digit radix sort, respectively. The GPU implementation of multi‐key quicksort gives 6× to 18× speed up compared with the CPU parallel implementation of parallel multi‐key quicksort. The GPU implementation of multi‐key quicksort achieves 1.5× to 3× speed up when compared with the GPU implementation of string sorting algorithm using singleton elements in the literature. Copyright © 2016 John Wiley & Sons, Ltd.

show abstract

CUDA‐quicksort: an improved GPU‐based implementation of quicksort

Cited by 21 publications

References 19 publications

Removing duplicate reads using graphics processing units

Removing duplicate reads using graphics processing units

A Comparative Study of Sorting Algorithms with FPGA Acceleration by High Level Synthesis

Kepler GPU accelerated recursive sorting using dynamic parallelism

Contact Info

Product

Resources

About