2016
DOI: 10.1016/j.jocs.2015.12.001
|View full text |Cite
|
Sign up to set email alerts
|

Computation–communication overlap and parameter auto-tuning for scalable parallel 3-D FFT

Abstract: a b s t r a c tParallel 3-D FFT is widely used in scientific applications, therefore it is important to achieve high performance on large-scale systems with many thousands of computing cores. This paper describes a new method for scalable high-performance parallel 3-D FFT. We use a 2-D decomposition of 3-D arrays to increase scaling to a large number of cores. In order to achieve high performance, we use non-blocking MPI all-to-all operations and exploit computation-communication overlap. We also auto-tune our… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 11 publications
(2 citation statements)
references
References 17 publications
0
2
0
Order By: Relevance
“…The progression of nonblocking communications is manually forced by inserting testing points in the overlapping window. More recently, Song et al developed an algorithm for the 3D Fast Fourier Transform using nonblocking MPI collectives [14]. Different parameters, such as the tiling size and the frequency of MPI Test calls to force the progression, are automatically determined in order to achieve performance.…”
Section: Asynchronous Communications In Scientific Applicationsmentioning
confidence: 99%
“…The progression of nonblocking communications is manually forced by inserting testing points in the overlapping window. More recently, Song et al developed an algorithm for the 3D Fast Fourier Transform using nonblocking MPI collectives [14]. Different parameters, such as the tiling size and the frequency of MPI Test calls to force the progression, are automatically determined in order to achieve performance.…”
Section: Asynchronous Communications In Scientific Applicationsmentioning
confidence: 99%
“…In the meantime, technological advances in hardware architectures are nearing exascale speed through co-design architectural designs, abundant General Purpose Graphical Processing Units (GPGPUs), hierarchical clustering of heterogeneous machines, and so forth. Despite the growth seen in the application sector and in the hardware architectural design sector of HPC, the performances of applications, including the The HPC community, therefore, oriented their mindset to mitigate the effects of known performance issues of large scale systems such as the dynamic nature of big data in applications (data sizes), heterogeneous hardware architectures [7], energy consumption issues, scalability issues [22,39], uncertainty of resources (including data resources), and so forth [18].…”
Section: S Benedictmentioning
confidence: 99%