2014
DOI: 10.1007/s11227-014-1133-x
|View full text |Cite
|
Sign up to set email alerts
|

Hierarchical approach to optimization of parallel matrix multiplication on large-scale platforms

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2014
2014
2020
2020

Publication Types

Select...
5
2

Relationship

4
3

Authors

Journals

citations
Cited by 18 publications
(11 citation statements)
references
References 30 publications
0
11
0
Order By: Relevance
“…Gahvari and Gropp [73] apply performance models to study two exascale use-cases (FFT and multigrids) and conclude that scalability of these applications is limited by interconnect bandwidth. Hasanov et al [88] present a hierarchical approach to distributed matrix multiplication. Evaluation on existing systems (Grid500 and BlueGene/P) and prediction for future exascale systems show better performance than the state-of-the-art.…”
Section: Scalablementioning
confidence: 99%
“…Gahvari and Gropp [73] apply performance models to study two exascale use-cases (FFT and multigrids) and conclude that scalability of these applications is limited by interconnect bandwidth. Hasanov et al [88] present a hierarchical approach to distributed matrix multiplication. Evaluation on existing systems (Grid500 and BlueGene/P) and prediction for future exascale systems show better performance than the state-of-the-art.…”
Section: Scalablementioning
confidence: 99%
“…Hierarchical optimization has been recently applied in the development of linear algebra routines for clusters or distributed systems. Hasanov et al applies a hierarchical approach to improve a Hierarchical SUMMA on large‐scale distributed platforms with components at different speeds. In our case, heterogeneity is considered between the nodes in the system and also within each node (CPU + coprocessors).…”
Section: Work In Processmentioning
confidence: 99%
“…This optimization technique is inspired by our previous study on the optimization of the communication cost of parallel matrix multiplication on largescale distributed memory platforms [16].…”
Section: Hierarchical Optimization Of Mpi Broadcast Algorithmsmentioning
confidence: 99%
“…In contrast to this approach, we focus on the scale of HPC platforms rather than their specific complexity and propose a solution that is universally applicable to all HPC platforms. The proposed solution is inspired by our previous study on parallel matrix multiplication on large-scale distributed memory platforms [16]. It provides a simple and general technique to optimize the legacy scientific MPI-based applications without redesigning them.…”
Section: Introductionmentioning
confidence: 99%