An integrated fine-grain runtime system for MPI

Kamal, Humaira; Wagner, Alan

doi:10.1007/s00607-013-0329-x

Cited by 11 publications

(7 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…All of the remaining processes are configured to be free processes. These free processes are all blocked on a receive call and FG-MPI's runtime scheduler [11] ensures that they remain on a blocked queue and do not add any overhead while blocked. Skip list processes make free node requests to the co-located manager process which cooperates with the other managers to find a free process.…”

Section: Fine-grain Mpimentioning

confidence: 99%

See 1 more Smart Citation

A scalable distributed skip list for range queries

Alam

Kamal

Wagner

2014

Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing

Self Cite

View full text Add to dashboard Cite

In this paper we present a distributed, message passing implementation of a dynamic dictionary structure for range queries. The structure is based on a distributed fine-grain implementation of skip lists that can scale across a cluster of multicore machines. Our implementation makes use of the unique features of Fine-Grain MPI and introduces novel algorithms and techniques to achieve scalable performance on a cluster of multicore machines. Unlike concurrent data structures the distributed skip list operations are deterministic and atomic. Range-queries are implemented in a way that parallelizes the operation and takes advantage of the recursive properties of the skip list structure. We report on the performance of the skip list for range-queries, on a medium sized cluster with two hundred cores.

show abstract

Section: Fine-grain Mpimentioning

confidence: 99%

“…A crucial element of our design is the use of Fine-Grain MPI [11] (FG-MPI). FG-MPI extends MPI and makes it possible to express and exploit finer-grain, function-level concurrency and parallelism by allowing for multiple MPI processes inside an OS-process.…”

Section: Introductionmentioning

confidence: 99%

A scalable distributed skip list for range queries

Alam

Kamal

Wagner

2014

Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing

Self Cite

View full text Add to dashboard Cite

show abstract

“…Kamal et al [18] make use of User Level Threads (ULT) in the MPICH 2 [12] to build an MPI-aware scheduler for coroutines that are swapped in and out for execution depending on the status of the MPI runtime. Lu et al [21] follow a similar approach by doing the context switch of ULTs inside the MPI to avoid the expensive MPI locking operations.…”

Section: Related Workmentioning

confidence: 99%

Optimizing computation-communication overlap in asynchronous task-based programs

Castillo¹,

Jain

Casas

et al. 2019

Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming

View full text Add to dashboard Cite

Asynchronous task-based programming models are gaining popularity to address the programmability and performance challenges in high performance computing. One of the main attractions of these models and runtimes is their potential to automatically expose and exploit overlap of computation with communication. However, we find that inefficient interactions between these programming models and the underlying messaging layer (in most cases, MPI) limit the achievable computationcommunication overlap and negatively impact the performance of parallel programs. We address this challenge by exposing and exploiting information about MPI internals in a task-based runtime system to make better task-creation and scheduling decisions. In particular, we present two mechanisms for exchanging information between MPI and a task-based runtime, and analyze their trade-offs. Further, we present a detailed evaluation of the proposed mechanisms implemented in MPI and a taskbased runtime. We show performance improvements of up to 16.3% and 34.5% for proxy applications with point-to-point and collective communication, respectively.

show abstract

“…The increasing complexity of Multiprocessor System-on-Chip (MPSoC) drives the needs for system software development. To exploit the computation capability of MPSoC, fine-grained task models like Intel's TBB [1], Cilk++ [2], Fine-grain MPI(FG-MPI) [3] and Simulink [4] have been proposed to expose the computation parallelism, which provides more chances for system performance optimization, including easier load balancing, greater potential for overlapping communication and computation, and improved platformindependence [5].…”

Section: Introductionmentioning

confidence: 99%

“…However, it is unclear which tasks and how many cycles of each task to be preprocessed, which requires theoretical guidance for users. For the scheduling challenge, existing scheduling approaches on fine-grained models [1]- [3], [5] mainly focus on runtime implementation, but design-time (i.e. static) approaches are also important.…”

Section: Introductionmentioning

confidence: 99%

Fine-Grained Communication-Aware Task Scheduling Approach for Acyclic and Cyclic Applications on MPSoCs

Huang

Jiang

et al. 2019

IEEE Access

View full text Add to dashboard Cite

Fine-grained task models can exploit parallelism to achieve high performance for multiprocessor system-on-chip (MPSoC). However, fine-grained models face the issues of high-communication overhead and difficult scheduling decisions, and the two challenges are interdependent. To address the issues, this paper gives a full analysis of the fine-grained communication optimization technique and communication pipeline, from both time and topology perspectives, and proposes a static fine-grained communication-aware task scheduling (FCATS) approach, which integrates scheduling with communication pipeline for acyclic and cyclic applications based on the fine-grained Simulink model. The approach contains search-based scheduling with high-quality solutions utilizing genetic algorithm-integer linear programming (GA-ILP) and hybrid GA-heuristic scheduling with short solving time to meet different demands for users. The experimental results with both synthetic and real-life benchmarks on the 4/8/16-CPU platform demonstrate the efficiency of the approach on performance improvements compared to previous works.

show abstract

An integrated fine-grain runtime system for MPI

Cited by 11 publications

References 14 publications

A scalable distributed skip list for range queries

A scalable distributed skip list for range queries

Optimizing computation-communication overlap in asynchronous task-based programs

Fine-Grained Communication-Aware Task Scheduling Approach for Acyclic and Cyclic Applications on MPSoCs

Contact Info

Product

Resources

About