Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis 2017
DOI: 10.1145/3126908.3126963
|View full text |Cite
|
Sign up to set email alerts
|

Why is MPI so slow?

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
2
2

Relationship

1
7

Authors

Journals

citations
Cited by 23 publications
(4 citation statements)
references
References 14 publications
0
4
0
Order By: Relevance
“…The data is normalized by reporting the number of DoF per node, so ideal weak scaling would correspond to coinciding lines. While the saturated performance is scaling well, giving a sustained performance of up to 4.4 PFlop/s, 2 most of the in-cache performance advantage is lost due to the communication latency over MPI, see also [62] for limits with MPI in PDE solvers. Defining the strong scaling limit as the point where throughput reduces to 80% of saturated performance [29], it is reached for wall times of 56 μs on 1 node.…”
Section: Performance-optimized Conjugate Gradient Methodsmentioning
confidence: 99%
“…The data is normalized by reporting the number of DoF per node, so ideal weak scaling would correspond to coinciding lines. While the saturated performance is scaling well, giving a sustained performance of up to 4.4 PFlop/s, 2 most of the in-cache performance advantage is lost due to the communication latency over MPI, see also [62] for limits with MPI in PDE solvers. Defining the strong scaling limit as the point where throughput reduces to 80% of saturated performance [29], it is reached for wall times of 56 μs on 1 node.…”
Section: Performance-optimized Conjugate Gradient Methodsmentioning
confidence: 99%
“…Nekbone has been updated to include vector solutions, which allows amortization of message and memory latencies. Nekbone has been used for assessment of advanced architectures and for evaluation of light-weight MPI implementations on the ALCF BG/Q, Cetus, in collaboration with Argonne's MPICH team (Raffenetti and et al 2017).…”
Section: Nekbench and Nekbonementioning
confidence: 99%
“…Parallel applications are the dominant workload in highperformance computing (HPC) systems. Many of these parallel programs run across multiple compute nodes and processors and use the Message Passing Interface (MPI) for distributed communications and work distribution [1]- [3]. Effective management of MPI applications is thus vital for improving system utilization and application performance for HPC systems.…”
Section: Introductionmentioning
confidence: 99%