Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis 2011
DOI: 10.1145/2063384.2063432
|View full text |Cite
|
Sign up to set email alerts
|

Scalable fast multipole methods on distributed heterogeneous architectures

Abstract: We fundamentally reconsider implementation of the Fast Multipole Method (FMM) on a computing node with a heterogeneous CPU-GPU architecture with multicore CPU(s) and one or more GPU accelerators, as well as on an interconnected cluster of such nodes. The FMM is a divideand-conquer algorithm that performs a fast N -body sum using a spatial decomposition and is often used in a timestepping or iterative loop. Using the observation that the local summation and the analysis-based translation parts of the FMM are in… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
47
0

Year Published

2012
2012
2019
2019

Publication Types

Select...
4
3
1

Relationship

2
6

Authors

Journals

citations
Cited by 39 publications
(47 citation statements)
references
References 24 publications
0
47
0
Order By: Relevance
“…The parameter k max used in D-M2L [9] was set to 18. Parameters M.k/, k D 1 to k max , were set to 6,8,12,16,20,26,30,34,38,44, 48, 52, 56, 60, 60, 52, 4, and 2. Odd numbers were avoided for M.k/ to improve the calculation efficiency [16].…”
Section: Cpu Codesmentioning
confidence: 99%
“…The parameter k max used in D-M2L [9] was set to 18. Parameters M.k/, k D 1 to k max , were set to 6,8,12,16,20,26,30,34,38,44, 48, 52, 56, 60, 60, 52, 4, and 2. Odd numbers were avoided for M.k/ to improve the calculation efficiency [16].…”
Section: Cpu Codesmentioning
confidence: 99%
“…Starting from [1], we design a new scalable heterogeneous FMM algorithm, which fully distributes all the translations among nodes and substantially decreases its communication costs. This is a consequence of the new data structures which separate the computation and communication to avoid synchronization during GPU computations.…”
Section: A Present Contributionmentioning
confidence: 99%
“…Implementation details for import or export data via LETs are not explicitly described in the well known distributed FMM papers, such as [8], [9], [11], [12]. Recently, [1] developed a distributed FMM algorithms for heterogeneous clusters. However, their algorithm repeated part of translation computations among nodes and required coefficients exchange of all the spatial boxes at the octree's bottom level.…”
Section: Introductionmentioning
confidence: 99%
“…Special purpose hardware such as graphics processors or heterogeneous CPU/GPU architectures also allow the fast computation of finite sums, either via brute force summation [18], or via the mapping of the FMM onto these architectures [19,20,21,22]. Yokota et al [22] favorably compare a large scale FMMbased vortex element computations with a direct numerical simulation via periodic pseudospectral methods.…”
Section: Introductionmentioning
confidence: 99%