Efficient Algorithms for Collective Operations with Notified Communication in Shared Windows

Ahad, Muhammed Abdullah Al; Simmendinger, Christian; Iakymchuk, Roman; Laure, Erwin

doi:10.1109/paw-atm.2018.00006

Cited by 2 publications

(2 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This implementation delivered up to the 3x performance boost compared to the best Intel MPI implementations v5.1.2 on the Salomon IT4I cluster (Infiniband FDR). We extended this idea to Allgather(V) and Allreduce with an adaptation of the dissemination algorithm [17], achieving up to 2x-4x performance improvements compared to the best performing MPI implementations on the Salomon IT4I cluster and the Beskow Cray XC40 cluster at PDC, KTH.…”

Section: Resultsmentioning

confidence: 99%

Efficient and Eventually Consistent Collective Operations

Iakymchuk,

Faustino,

Emerson

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Collective operations are common features of parallel programming models that are frequently used in High-Performance (HPC) and machine/ deep learning (ML/ DL) applications. In strong scaling scenarios, collective operations can negatively impact the overall application performance: with the increase in core count, the load per rank decreases, while the time spent in collective operations increases logarithmically.In this article, we propose a design for eventually consistent collectives suitable for ML/ DL computations by reducing communication in Broadcast and Reduce, as well as by exploring the Stale Synchronous Parallel (SSP) synchronization model for the Allreduce collective. Moreover, we also enrich the GASPI ecosystem with frequently used classic/ consistent collective operations -such as Allreduce for large messages and AlltoAll used in an HPC code. Our implementations show promising preliminary results with significant improvements, especially for Allreduce and AlltoAll, compared to the vendor-provided MPI alternatives.

show abstract

Section: Resultsmentioning

confidence: 99%

Efficient and Eventually Consistent Collective Operations

Iakymchuk,

Faustino,

Emerson

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…This implementation delivered up to the 3x performance boost compared to the best Intel MPI implementations v5.1.2 on the Salomon IT4I cluster (Infiniband FDR). We extended this idea to Allgather(V) and Allreduce with an adaptation of the dissemination algorithm [16], achieving up to 2x-4x performance improvements compared to the best performing MPI implementations on the Salomon IT4I cluster and the Beskow Cray XC40 cluster at PDC, KTH.…”

Section: D) Consistent Allreducementioning

confidence: 99%

Efficient and Eventually Consistent Collective Operations

Iakymchuk

Amandio

Emerson

et al. 2021

2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Self Cite

View full text Add to dashboard Cite

Collective operations are common features of parallel programming models that are frequently used in High-Performance (HPC) and machine/ deep learning (ML/ DL) applications. In strong scaling scenarios, collective operations can negatively impact the overall application performance: with the increase in core count, the load per rank decreases, while the time spent in collective operations increases logarithmically.In this article, we propose a design of eventually consistent collectives suitable for ML/ DL computations by reducing communication in Broadcast and Reduce, as well as by exploring the Stale Synchronous Parallel (SSP) synchronization model for the Allreduce collective. Moreover, we also enrich the GASPI ecosystem with frequently used classic/ consistent collective operations -such as Allreduce for large messages and AlltoAll used in an HPC code. Our implementations show promising preliminary results with significant improvements, especially for Allreduce and AlltoAll, compared to the vendor-provided MPI alternatives.

show abstract

Efficient Algorithms for Collective Operations with Notified Communication in Shared Windows

Cited by 2 publications

References 14 publications

Efficient and Eventually Consistent Collective Operations

Efficient and Eventually Consistent Collective Operations

Efficient and Eventually Consistent Collective Operations

Contact Info

Product

Resources

About