Proceedings of the 7th ACM/SPEC on International Conference on Performance Engineering 2016
DOI: 10.1145/2851553.2851575
|View full text |Cite
|
Sign up to set email alerts
|

Communication Characterization and Optimization of Applications Using Topology-Aware Task Mapping on Large Supercomputers

Abstract: On large supercomputers, the job scheduling systems may assign a non-contiguous node allocation for user applications depending on available resources. With parallel applications using MPI (Message Passing Interface), the default process ordering does not take into account the actual physical node layout available to the application. This contributes to non-locality in terms of physical network topology and impacts communication performance of the application. In order to mitigate such performance penalties, t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
2
0

Year Published

2017
2017
2019
2019

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(2 citation statements)
references
References 21 publications
0
2
0
Order By: Relevance
“…Figure 1 demonstrates runtime variability when running fast Fourier Transform programs [9,10] for solving the Navier-Stokes equations on Hazelhen and Shaheen II. Both these computers use adaptive routing and job placement to speed up job throughput [18], but this can result in some run time variability, a subject of current research [7,28,35,38,39].…”
Section: Lessons Learnedmentioning
confidence: 99%
“…Figure 1 demonstrates runtime variability when running fast Fourier Transform programs [9,10] for solving the Navier-Stokes equations on Hazelhen and Shaheen II. Both these computers use adaptive routing and job placement to speed up job throughput [18], but this can result in some run time variability, a subject of current research [7,28,35,38,39].…”
Section: Lessons Learnedmentioning
confidence: 99%
“…As a result, hypergraph partitioning also leads to a reduction in parallel communication requirements, albeit at a larger one-time setup cost. Topology-aware task mapping is used to accurately map partitions to the allocated nodes of a supercomputer, reducing the overall cost associated with communication [11,12,13,14,15]. The approach introduced in this paper complements these efforts by providing an additional level of optimization in handling communication.Topology-aware methods and aggregation of data are commmonly used to reduce communication costs, particularly in collective operations [16,17,18,19].…”
mentioning
confidence: 99%