2018
DOI: 10.1007/s11227-018-2356-z
|View full text |Cite
|
Sign up to set email alerts
|

Improving all-reduce collective operations for imbalanced process arrival patterns

Abstract: Two new algorithms for the all-reduce operation, optimized for imbalanced process arrival patterns (PAPs) are presented: (i) sorted linear tree (SLT), (ii) pre-reduced ring (PRR) as well as a new way of on-line PAP detection, including process arrival time (PAT) estimations and their distribution between cooperating processes was introduced. The idea, pseudo-code, implementation details, benchmark for performance evaluation and a real case example for machine learning are provided. The results of the experimen… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
17
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
3
1
1

Relationship

2
3

Authors

Journals

citations
Cited by 8 publications
(17 citation statements)
references
References 13 publications
0
17
0
Order By: Relevance
“…In [23] we proposed two new, hardware agnostic, allreduce algorithms optimized for imbalanced PAP occurrence, the solution included a PAP detection mechanism based on progress monitoring by an additional background thread placed in every process participating in the collective operation. A benchmark evaluating the performance of the algorithms was described and experimental results comparing with other typically used algorithms were provided.…”
Section: Optimization Of Collectives With Imbalanced Papsmentioning
confidence: 99%
See 3 more Smart Citations
“…In [23] we proposed two new, hardware agnostic, allreduce algorithms optimized for imbalanced PAP occurrence, the solution included a PAP detection mechanism based on progress monitoring by an additional background thread placed in every process participating in the collective operation. A benchmark evaluating the performance of the algorithms was described and experimental results comparing with other typically used algorithms were provided.…”
Section: Optimization Of Collectives With Imbalanced Papsmentioning
confidence: 99%
“…The implementation uses C language (v. C99, compiled by GCC v. 7.3.0 with -O 3 optimization), with OpenMPI [10] (v. 3.0.0) for processes/nodes message exchange, POSIX Threads [4] (v. 2.12) for intranode communication and synchronization, and GLibc (v. 2.0) for dynamic data structures' management. The similar approach was used in [23].…”
Section: Experimental Evaluationmentioning
confidence: 99%
See 2 more Smart Citations
“…In our case we optimize collective operations by providing some algorithms (partially) resilient to imbalanced PAP environment, e.g. [15,16].…”
Section: Introductionmentioning
confidence: 99%