2020
DOI: 10.1016/j.cpc.2020.107272
|View full text |Cite
|
Sign up to set email alerts
|

PittPack: An open-source Poisson’s equation solver for extreme-scale computing with accelerators

Abstract: We present a parallel implementation of a direct solver for the Poisson's equation on extreme-scale supercomputers with accelerators. We introduce a chunkedpencil decomposition as the domain-decomposition strategy to distribute work among processing elements to achieve superior scalability at large number of accelerators. Chunked-pencil decomposition enables overlapping nodal communication and data transfer between the central processing units (CPUs) and the graphics processing units (GPUs). Second, it improve… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
2
1
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(1 citation statement)
references
References 39 publications
0
1
0
Order By: Relevance
“…Despite having a similar interface to the FFTW library used in the CPU version, cuFFT does not support the family of real-to-real transforms implemented in FFTW. Therefore, the different real-to-real transforms must be implemented by pre-and postprocessing FFTs [21,17,22]. Some of these transforms have been implemented in the GPU version, namely the standard fast discrete sine and cosine transforms DCT-II and DST-II (and the corresponding inverse transforms, DCT-III and DST-III), following the low-storage approach of Makhoul [21].…”
Section: Implementation Of the Fft-based Transforms Using Cufftmentioning
confidence: 99%
“…Despite having a similar interface to the FFTW library used in the CPU version, cuFFT does not support the family of real-to-real transforms implemented in FFTW. Therefore, the different real-to-real transforms must be implemented by pre-and postprocessing FFTs [21,17,22]. Some of these transforms have been implemented in the GPU version, namely the standard fast discrete sine and cosine transforms DCT-II and DST-II (and the corresponding inverse transforms, DCT-III and DST-III), following the low-storage approach of Makhoul [21].…”
Section: Implementation Of the Fft-based Transforms Using Cufftmentioning
confidence: 99%