2011
DOI: 10.1145/2049662.2049664
|View full text |Cite
|
Sign up to set email alerts
|

Exploiting parallelism in matrix-computation kernels for symmetric multiprocessor systems

Abstract: We present a simple and efficient methodology for the development, tuning, and installation of matrix algorithms such as the hybrid Strassen's and Winograd's fast matrix multiply or their combination with the 3M algorithm for complex matrices (i.e., hybrid: a recursive algorithm as Strassen's until a highly tuned BLAS matrix multiplication allows performance advantages). We investigate how modern Symmetric Multiprocessor (SMP) architectures present old and new challenges that can be addressed by the combinatio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2015
2015
2019
2019

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 19 publications
(4 citation statements)
references
References 34 publications
0
4
0
Order By: Relevance
“…In particular, numerical linear algebra based on Strassen's algorithm (if numerical stability issues have been considered acceptable) should clearly benefit from most of its results. Related work on the parallelization of the sub-cubic numerical linear algebra include [1,24,6,25,2].…”
Section: Introductionmentioning
confidence: 99%
“…In particular, numerical linear algebra based on Strassen's algorithm (if numerical stability issues have been considered acceptable) should clearly benefit from most of its results. Related work on the parallelization of the sub-cubic numerical linear algebra include [1,24,6,25,2].…”
Section: Introductionmentioning
confidence: 99%
“…There are several sequential implementations of Strassen's fast matrix multiplication algorithm [2,11,17], and parallel versions have been implemented for both shared-memory [9,25] and distributedmemory architectures [3,13]. For our parallel algorithms in Section 4, we use the ideas of breadth-first and depth-first traversals of the recursion trees, which were first considered by Kumar et al [25] and Ballard et al [3] for minimizing memory footprint and communication.…”
Section: Related Workmentioning
confidence: 99%
“…However, this metric lets us compare all of the algorithms on an inverse-time scale, normalized by problem size [13,27]. We compare our code-generated Strassen implementation with MKL's dgemm and a tuned implementation of Strassen-Winograd from D'Alberto et al [9] (recall that Strassen-Winograd performs the same number of multiplications but fewer matrix additions than Strassen's algorithm). The code generator's implementation outperforms MKL and is competitive with the tuned implementation.…”
Section: Code Generationmentioning
confidence: 99%
See 1 more Smart Citation