Proceedings of Scalable Parallel Libraries Conference
DOI: 10.1109/splc.1993.365573
|View full text |Cite
|
Sign up to set email alerts
|

Comparison of scalable parallel matrix multiplication libraries

Abstract: This paper compares two general library routines for performing parallel distributed matriz multiplication.The PUMMA algorithm utilizes block scattered data layout, whereas BiMM eR utilizes virtual 2 -0 torus wrap. The algorithmic differences resulting from these different layouts are discussed as well as the general issues associated with different data layouts for library routines. Results on the Intel Delta for the two matrix multiplication algorithms are presented.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
16
0

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 17 publications
(16 citation statements)
references
References 9 publications
0
16
0
Order By: Relevance
“…However, it is competitive, or faster, and, given its simplicity and flexibility, warrants consideration. Moreover, the implementations by Huss-Lederman et al [7,8] are competitive with PUMMA, and would thus compare similarly with SUMMA. Also, our method is presented in a slightly simplified setting and thus the performance may be slightly better than it would be if we implemented exactly for the cases for which PUMMA and the algorithm by Huss-Lederman et al were designed.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…However, it is competitive, or faster, and, given its simplicity and flexibility, warrants consideration. Moreover, the implementations by Huss-Lederman et al [7,8] are competitive with PUMMA, and would thus compare similarly with SUMMA. Also, our method is presented in a slightly simplified setting and thus the performance may be slightly better than it would be if we implemented exactly for the cases for which PUMMA and the algorithm by Huss-Lederman et al were designed.…”
Section: Resultsmentioning
confidence: 99%
“…Two recent efforts extend the work by Fox et al to general meshes of nodes: the paper by Choi et al [6] uses a two-dimensional block-wrapped (block-cyclic) data decomposition, while the papers by Huss-Lederman et al [7,8] use a 'virtual' 2-D torus wrap data layout. Both these efforts report very good performance attained on the Intel Touchstone Delta, achieving a sizeable percentage of peak performance.…”
Section: Introductionmentioning
confidence: 99%
“…We discuss a permutation compatible data distribution (i.e. virtual 2D torus wrap data distribution [21,7]), which is used to distribute matrices on two-dimensional process grid topologies. Finally, we introduce a modified virtual 2D data distribution that can solve the potential load imbalance problem induced by the virtual 2D torus wrap data distribution.…”
Section: Permutation Compatible Data Distributionsmentioning
confidence: 99%
“…For a non-square grid G P×Q , we can view it as a α × α virtual grid [21,7], where α is the least common multiple of P and Q. Then we distribute matrices on this α × α virtual grid [7].…”
Section: The Virtual 2-dimensional Gridmentioning
confidence: 99%
See 1 more Smart Citation