2008
DOI: 10.1109/ipdps.2008.4536163
|View full text |Cite
|
Sign up to set email alerts
|

An efficient, model-based CPU-GPU heterogeneous FFT library

Abstract: General-Purpose computing on Graphics ProcessingUnits (GPGPU) is becoming popular in HPC because of its high peak performance. However, in spite of the potential performance improvements as well as recent promising results in scientific computing applications, its real performance is not necessarily higher than that of the current high-performance CPUs, especially with recent trends towards increasing the number of cores on a single die. This is because the GPU performance can be severely limited by such restr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2011
2011
2023
2023

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 33 publications
(10 citation statements)
references
References 10 publications
0
10
0
Order By: Relevance
“…They are typically used in applications where data locality is important because they do not require data redistribution. The methods [32], [33], [34] solve the singleobjective optimization problem for performance on heterogeneous platforms. The methods [11], [12], [15] solve the bi-objective optimization problem for performance and energy for homogeneous and heterogeneous platforms.…”
Section: Static and Dynamic Optimization Methodsmentioning
confidence: 99%
“…They are typically used in applications where data locality is important because they do not require data redistribution. The methods [32], [33], [34] solve the singleobjective optimization problem for performance on heterogeneous platforms. The methods [11], [12], [15] solve the bi-objective optimization problem for performance and energy for homogeneous and heterogeneous platforms.…”
Section: Static and Dynamic Optimization Methodsmentioning
confidence: 99%
“…Additionally, to decrease the overhead for data transfer between the host and device memories, they performed matrix transposition before sending data. Chen and Li [10] extended the approach of Gu and others [16] and used both a GPU and CPU for FFT computations, similar to Ogata and others [24]. Unlike Gu and others [16], they used a 2D data-copy application programming interface (API) instead of gathering multiple subarrays before sending them to transfer multidimensional data.…”
Section: Related Workmentioning
confidence: 99%
“…Performance models have been proposed to implement work-distribution schemes (Choi et al 2013; Zhong et al 2012). In Ogata et al (2008), the authors present a library for 2D Fast Fourier Transform (FFT) that automatically uses both CPUs and GPUs to achieve optimal performance. Using a performance model, it evaluates the respective contributions of each computing unit and then makes an estimation of total execution times.…”
Section: Related Workmentioning
confidence: 99%