Dynamic Loop Scheduling Using MPI Passive-Target Remote Memory Access

Eleliemy, Ahmed; Ciorba, Florina M.

doi:10.1109/empdp.2019.8671619

Cited by 4 publications

(13 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Implementation Approaches: Hierarchical DLS techniques can be implemented either using the hierarchical master-worker [13] or using the distributed chunk-calculation model [15]. The present work evaluates the use of two different implementations, MPI+OpenMP and MPI+MPI, to complement the distributed chunk-calculation approach (see Section 2).…”

Section: Methodsmentioning

confidence: 99%

“…Recently, a distributed chunk-calculation approach was proposed for developing DLS techniques executing on distributed-memory systems [15]. This approach eliminated the use of the master-worker model by exploiting the one-sided communication features offered in the MPI-3 standard.…”

Section: Background and Related Workmentioning

confidence: 99%

“…The proposed approach applies two DLS techniques at the intra-and inter-node levels as follows: one MPI process creates a global shared-memory region, called global work queue. This global queue stores information regarding the latest scheduling step and the total scheduled loop iterations [15]. Using MPI Win allocate shared, the MPI processes within one compute node create another shared-memory region, called local work queue.…”

Section: The Hierarchical Dls Approachmentioning

confidence: 99%

“…This work proposes a novel approach for designing and developing hierarchical DLS techniques for distributed-memory systems. It extends the distributed chunk calculation approach [15] by allowing any group of workers to reside on a shared-memory system to form a shared work queue where the chunks to be executed by this group are stored. The novelty of the proposed approach lies in the fact that the responsibility of the work queue is shared among the workers of the group.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Hierarchical Dynamic Loop Self-Scheduling on Distributed-Memory Systems Using an MPI+MPI Approach

Eleliemy

Ciorba

2019

2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Self Cite

View full text Add to dashboard Cite

Computationally-intensive loops are the primary source of parallelism in scientific applications. Such loops are often irregular and a balanced execution of their loop iterations is critical for achieving high performance. However, several factors may lead to an imbalanced load execution, such as problem characteristics, algorithmic, and systemic variations. Dynamic loop self-scheduling (DLS) techniques are devised to mitigate these factors, and consequently, improve application performance. On distributed-memory systems, DLS techniques can be implemented using a hierarchical master-worker execution model and are, therefore, called hierarchical DLS techniques. These techniques self-schedule loop iterations at two levels of hardware parallelism: across and within compute nodes. Hybrid programming approaches that combine the message passing interface (MPI) with open multi-processing (OpenMP) dominate the implementation of hierarchical DLS techniques. The MPI-3 standard includes the feature of sharing memory regions among MPI processes. This feature introduced the MPI+MPI approach that simplifies the implementation of parallel scientific applications. The present work designs and implements hierarchical DLS techniques by exploiting the MPI+MPI approach. Four well-known DLS techniques are considered in the evaluation proposed herein. The results indicate certain performance advantages of the proposed approach compared to the hybrid MPI+OpenMP approach.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Background and Related Workmentioning

confidence: 99%

Section: The Hierarchical Dls Approachmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Hierarchical Dynamic Loop Self-Scheduling on Distributed-Memory Systems Using an MPI+MPI Approach

Eleliemy

Ciorba

2019

2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Self Cite

View full text Add to dashboard Cite

show abstract

“…Although centralizing the chunk assignment does not mean centralizing the chunk calculation, many of the recent DLS techniques employ a master-worker execution model that centralizes both the chunk calculation and the chunk assignment at the master side [6,7,8,9,10]. The current work extends our earlier distributed chunk calculation approach (DCA) [11] and makes the following unique contributions.…”

Section: Introductionmentioning

confidence: 94%

A Distributed Chunk Calculation Approach for Self-scheduling of Parallel Applications on Distributed-memory Systems

Eleliemy¹,

Ciorba²

2021

Preprint

Self Cite

View full text Add to dashboard Cite

Loop scheduling techniques aim to achieve load-balanced executions of scientific applications. Dynamic loop self-scheduling (DLS) libraries for distributed-memory systems are typically MPI-based and employ a centralized chunk calculation approach (CCA) to assign variably-sized chunks of loop iterations. We present a distributed chunk calculation approach (DCA) that supports various types of DLS techniques. Using both CCA and DCA, twelve DLS techniques are implemented and evaluated in different CPU slowdown scenarios. The results show that the DLS techniques implemented using DCA outperform their corresponding ones implemented with CCA, especially in extreme system slowdown scenarios.

show abstract

Hierarchical dynamic workload scheduling on heterogeneous clusters for grid search of inverse problems

2023

View full text Add to dashboard Cite

Inverse problems occur in many scientific fields. Albeit grid search, where points of a regular grid are tested as possible solutions, is a straightforward and robust method to numerically solve inverse problems, it is computationally intensive and becomes prohibitive when the problem has a high dimensionality. Heterogeneous clusters are a viable and cost-effective solution to exploit the combined computational power of multiple available computers. In this paper, we present a computing framework that supports efficient grid search for inverse problems on heterogeneous clusters. Scheduling the workload on such systems might be challenging, especially when nodes are comprised of CPUs and GPUs with different computational speeds. The framework dynamically schedules computations on the processing elements of the cluster according to a selected performance index, which is determined at run-time. The framework is extensible, as it allows easy integration of additional inverse problems.

show abstract

Dynamic Loop Scheduling Using MPI Passive-Target Remote Memory Access

Cited by 4 publications

References 27 publications

Hierarchical Dynamic Loop Self-Scheduling on Distributed-Memory Systems Using an MPI+MPI Approach

Hierarchical Dynamic Loop Self-Scheduling on Distributed-Memory Systems Using an MPI+MPI Approach

A Distributed Chunk Calculation Approach for Self-scheduling of Parallel Applications on Distributed-memory Systems

Hierarchical dynamic workload scheduling on heterogeneous clusters for grid search of inverse problems

Contact Info

Product

Resources

About