2006
DOI: 10.1007/11558958_30
|View full text |Cite
|
Sign up to set email alerts
|

A Family of High-Performance Matrix Multiplication Algorithms

Abstract: Abstract. During the last half-decade, a number of research efforts have centered around developing software for generating automatically tuned matrix multiplication kernels. These include the PHiPAC project and the ATLAS project. The software endproducts of both projects employ brute force to search a parameter space for blockings that accommodate multiple levels of memory hierarchy. We take a different approach: using a simple model of hierarchical memories we employ mathematics to determine a locally-optima… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2006
2006
2016
2016

Publication Types

Select...
5
3

Relationship

3
5

Authors

Journals

citations
Cited by 22 publications
(9 citation statements)
references
References 10 publications
0
9
0
Order By: Relevance
“…It is well-known that high performance can be achieved in a portable fashion by casting algorithms in terms of matrix-matrix multiplication [13,10,14,8]. In Figure 2 we show LINPACK(-like) and LAPACK blocked algorithms, LU lin blk and LU lap blk respectively, both built upon an LAPACK unblocked algorithm.…”
Section: Blocked Right-looking Lu Factorizationmentioning
confidence: 99%
“…It is well-known that high performance can be achieved in a portable fashion by casting algorithms in terms of matrix-matrix multiplication [13,10,14,8]. In Figure 2 we show LINPACK(-like) and LAPACK blocked algorithms, LU lin blk and LU lap blk respectively, both built upon an LAPACK unblocked algorithm.…”
Section: Blocked Right-looking Lu Factorizationmentioning
confidence: 99%
“…Thus, it reveals that autotuning is unnecessary for the operation that has been tauted by the autotuning community as the example of the success of autotuning. The problem with that work ( [Yotov et al 2005]) is that the ATLAS approach to optimizing gemm had been previously shown to be suboptimal, first in theory [Gunnels et al 2001] and then in practice [Goto and van de Geijn 2008b]. Furthermore, ATLAS leverages an inner kernel optimized by a human expert, which still involves a substantial manual encoding.…”
Section: Introductionmentioning
confidence: 99%
“…-As mentioned previously, given that BLIS isolates performance-sensitive code to a few simple kernels, the framework may aid those who wish to automate the generation of high-performance linear algebra libraries from domain and hardware specifications [Püschel et al 2005;Marker et al 2012;Siek et al 2008;Belter et al 2009]. -As computing systems become less reliable, whether because of quantum physical effects, power consumption restrictions, or outright power failures, the community may become increasingly interested in adding algorithmic fault-tolerance to the BLAS (or BLAS-equivalent) layer of the dense linear algebra software stack [Gunnels et al 2001b;Huang and Abraham 1984]. We plan to investigate the suitability of BLIS as a vehicle to provide such fault-tolerance.…”
Section: Discussionmentioning
confidence: 99%
“…This idea is not new van de Geijn 2008a, 2008b;Gunnels et al 2001b;Whaley and Dongarra 1998. Section 5 discusses how these level-3 operations are implemented in the BLIS framework so that flexibility (i.e., generality), portability, and high performance are simultaneously achieved.…”
Section: Level-3: Matrix-matrix Operationsmentioning
confidence: 99%