Enhancing Load-Balancing of MPI Applications with Workshare

Dionisi, Thomas; Bouhrour, Stephane; Jaeger, Julien; Carribault, Patrick; Pérache, Marc

doi:10.1007/978-3-030-85665-6_29

Cited by 2 publications

(1 citation statement)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The second set of experiments consists in showing the impact on performance when the number of nodes increases and the problems' sizes do not change. This is done for different values of imbalance for MiniFE.The value used in [23] is the largest value use in our expermientations. The third set of experiments consists in showing the impact on performance when the number of nodes increases and the problems' sizes change for MiniFE.…”

Section: Methodsmentioning

confidence: 99%

SABO: Dynamic MPI+OpenMP Resource Balancer

Barbosa

Lemarinier

Papauré

et al. 2022

2022 IEEE/ACM Fifth Annual Workshop on Emerging Parallel and Distributed Runtime Systems and Middleware (IPDRM)

View full text Add to dashboard Cite

Scientific parallel applications often use MPI for inter-node communications and OpenMP for intra-node orchestration. Parallel applications such as particle transport, seismic wave propagation simulator, or Finite-Element applications often exhibit workload imbalance due to their nature of ongoing data movement. These applications usually develop software balancing strategies triggered when some imbalance thresholds are detected to reduce this imbalance. These developments are complex to implement and impact the entire distributed applications' performance by synchronizing and exchanging the load over the network. This paper proposes a method to dynamically detect load imbalance and balance the computation by redistributing OpenMP threads between MPI processes local to the node. With minimal impact on the applications' codes, we demonstrate how this technique can improve the overall applications' performance up to 28% on MiniFE, 17% on Quicksilver, and 3% on Ondes3D. We also present its impact when executing multiple nodes and our proposed approach's limitations.

show abstract

Section: Methodsmentioning

confidence: 99%