2012 IEEE 14th International Conference on High Performance Computing and Communication &Amp; 2012 IEEE 9th International Confe 2012
DOI: 10.1109/hpcc.2012.44
|View full text |Cite
|
Sign up to set email alerts
|

Scalable Distributed Fast Multipole Methods

Abstract: Abstract-The Fast Multipole Method (FMM) allows O(N ) evaluation to any arbitrary precision of N -body interactions that arises in many scientific contexts. These methods have been parallelized, with a recent set of papers attempting to parallelize them on heterogeneous CPU/GPU architectures [1]. While impressive performance was reported, the algorithms did not demonstrate complete weak or strong scalability. Further, the algorithms were not demonstrated on nonuniform distributions of particles that arise in p… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2012
2012
2020
2020

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 18 publications
0
3
0
Order By: Relevance
“…We develop new data structures for the distributed algorithm which separate the computation and communication to avoid synchronization during GPU computations. The new data structures [7] build on the local essential tree (LET) [8], [9] concept but use a master-slave model and further have a novel parallel construction algorithm, in which the granularity is at the level of the spatial boxes (which allows finer parallelization than at the single-node level). Basically, each node divides its assigned domain into small spatial boxes via octrees and classifies each box into one of five categories in parallel.…”
Section: Major Contributionsmentioning
confidence: 99%
“…We develop new data structures for the distributed algorithm which separate the computation and communication to avoid synchronization during GPU computations. The new data structures [7] build on the local essential tree (LET) [8], [9] concept but use a master-slave model and further have a novel parallel construction algorithm, in which the granularity is at the level of the spatial boxes (which allows finer parallelization than at the single-node level). Basically, each node divides its assigned domain into small spatial boxes via octrees and classifies each box into one of five categories in parallel.…”
Section: Major Contributionsmentioning
confidence: 99%
“…Therefore, we sought to extend our software to run on multiple nodes. There has been a lot of work done on parallelizing the FMM across many nodes (Hu et al, 2011, 2012; Lashuk et al, 2012), especially with regard to maintaining good strong and weak scaling. In many cases, multi-node FMM algorithms have been applied to the BEM (Dang et al, 2016; Malhotra and Biros, 2016; Michiels et al, 2015; Yokota et al, 2011).…”
Section: Introductionmentioning
confidence: 99%
“…The FMM can be efficiently parallelized [37]. The first implementation of the FMM on graphics processors [32] was developed further [38,39], where the FMM was implemented on heterogeneous computing architectures consisting of multicore CPUs and GPUs. This FMM parallelization strategy for heterogeneous architectures was successfully used in fluid and molecular dynamics [40,41,42,43,44] and in electro-and magnetostatics [45].…”
Section: Introductionmentioning
confidence: 99%