2019
DOI: 10.1016/j.parco.2019.01.004
|View full text |Cite
|
Sign up to set email alerts
|

A hybrid CPU/GPU approach for optimizing sorting throughput

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
10
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 15 publications
(10 citation statements)
references
References 37 publications
0
10
0
Order By: Relevance
“…It is well documented that PCIe data transfers are a bottlebeck in GPU computing (Fujii et al, 2013;Van Werkhoven et al, 2014;Gowanlock and Karsin, 2019). In the case of the L-S algorithm, transferring periodogram(s) from the GPU back to the host requires a non-negligible amount of time.…”
Section: Transferring the Periodogram To The Hostmentioning
confidence: 99%
See 2 more Smart Citations
“…It is well documented that PCIe data transfers are a bottlebeck in GPU computing (Fujii et al, 2013;Van Werkhoven et al, 2014;Gowanlock and Karsin, 2019). In the case of the L-S algorithm, transferring periodogram(s) from the GPU back to the host requires a non-negligible amount of time.…”
Section: Transferring the Periodogram To The Hostmentioning
confidence: 99%
“…In the case of the L-S algorithm, transferring periodogram(s) from the GPU back to the host requires a non-negligible amount of time. To reduce this bottleneck, we employ the methods in Gowanlock and Karsin (2019) that reduce the overhead of performing host/device data transfers. We give a brief overview of the data transfer method here, but refer the interested reader to Gowanlock and Karsin (2019) for more detail.…”
Section: Transferring the Periodogram To The Hostmentioning
confidence: 99%
See 1 more Smart Citation
“…Parallel computing devices targeted by OpenCL include PCs, servers, handheld devices, and embedded platforms. 7 It acts on the heterogeneous platform in the form of API, and the computing resources of multiple microprocessors, graphics processors, reconfigurable hardware, digital signal processors, and other platforms can be regarded as computing units for scheduling and sharing, so as to realize parallel computing across hardware architectures. 8 This thesis fully considers the cross-platform characteristics and aims to study how to design a parallel algorithm that can be processed on different GPU platforms with the help of the OpenCL programming model to solve the performance bottleneck of the radix sorting algorithm.…”
Section: Introductionmentioning
confidence: 99%
“…In response to the limitations of CUDA on heterogeneous platforms, Apple introduced a public, open cross‐platform parallel computing standard called Open Computing Language (OpenCL). Parallel computing devices targeted by OpenCL include PCs, servers, handheld devices, and embedded platforms 7 . It acts on the heterogeneous platform in the form of API, and the computing resources of multiple microprocessors, graphics processors, reconfigurable hardware, digital signal processors, and other platforms can be regarded as computing units for scheduling and sharing, so as to realize parallel computing across hardware architectures 8 …”
Section: Introductionmentioning
confidence: 99%