2015
DOI: 10.1007/978-3-319-15976-8_10
|View full text |Cite
|
Sign up to set email alerts
|

Performance Analysis of Irregular Collective Communication with the Crystal Router Algorithm

Abstract: In order to achieve exascale performance it is important to detect potential bottlenecks and identify strategies to overcome them. For this, both applications and system software must be analysed and potentially improved. The EU FP7 project Collaborative Research into Exascale Systemware, Tools & Applications (CRESTA) chose the approach to co-design advanced simulation applications and system software as well as development tools. In this paper, we present the results of a co-design activity focused on the sim… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 10 publications
0
2
0
Order By: Relevance
“…The crystal router algorithm performs the nearest neighbor collective via recursive hypercube folding. The algorithm is described in full by Lamb et al (1988), and some performance results of this algorithm’s use in Nek5000 were presented by Schliephake and Laure (2014). For a P process grid, the algorithm can be summarized as follows: divide the grid in half and pair each process p l in the lower half with a distinct process p h in the upper half.…”
Section: The Nekbone Benchmarkmentioning
confidence: 99%
“…The crystal router algorithm performs the nearest neighbor collective via recursive hypercube folding. The algorithm is described in full by Lamb et al (1988), and some performance results of this algorithm’s use in Nek5000 were presented by Schliephake and Laure (2014). For a P process grid, the algorithm can be summarized as follows: divide the grid in half and pair each process p l in the lower half with a distinct process p h in the upper half.…”
Section: The Nekbone Benchmarkmentioning
confidence: 99%
“…This is similar to the Global Segment Map (GSMap) in MCT, which in contrast is stored in every processor, leading to O(N x ) memory requirements. The parallel communication infrastructure in MOAB is heavily leveraged (Tautges et al, 2012) to utilize the scalable crystal router algorithm (Fox et al, 1989;Schliephake and Laure, 2015) in order to scalably communicate the covering cells to different processors. This parallel mesh infrastructure in MOAB provides the necessary algorithmic tools for optimally executing online remapping strategies, so that MCT in E3SM can be replaced with a MOAB-based coupler.…”
Section: Algorithmic Approachmentioning
confidence: 99%