ASAP 2011 - 22nd IEEE International Conference on Application-Specific Systems, Architectures and Processors 2011
DOI: 10.1109/asap.2011.6043234
|View full text |Cite
|
Sign up to set email alerts
|

A high-performance, low-power linear algebra core

Abstract: Achieving high-performance while reducing power consumption is the key question as technology scaling is reaching its limits. It is well-accepted that application-specific custom hardware can achieve orders of magnitude improvements in efficiency. The question is whether such efficiency can be maintained while providing enough flexibility to implement a broad class of operations. In this paper, we aim to answer this question for the domain of matrix computations. We propose a design of a novel linear algebra p… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
26
0

Year Published

2012
2012
2020
2020

Publication Types

Select...
3
2
1

Relationship

1
5

Authors

Journals

citations
Cited by 22 publications
(26 citation statements)
references
References 35 publications
0
26
0
Order By: Relevance
“…For the vector norm, we use the original algorithm as the baseline, which requires 257, 769 or 1025 operations per corresponding vector norm of size k = 64, 128, 256. Since our implementation will result in an effective reduction in the number of actually required computations, the extensions have higher GOPS/W than what is reported as peak GFLOPS/W for the LAC in [5].…”
Section: B Performance and Efficiency Analysismentioning
confidence: 81%
See 3 more Smart Citations
“…For the vector norm, we use the original algorithm as the baseline, which requires 257, 769 or 1025 operations per corresponding vector norm of size k = 64, 128, 256. Since our implementation will result in an effective reduction in the number of actually required computations, the extensions have higher GOPS/W than what is reported as peak GFLOPS/W for the LAC in [5].…”
Section: B Performance and Efficiency Analysismentioning
confidence: 81%
“…The microarchitecture of the Linear Algebra Core (LAC) is illustrated in Figure 1. LAC achieves orders of magnitude better efficiency in power and area consumption compared to conventional general purpose architectures [5]. It is specifically optimized to perform rank-1 updates that form the inner kernels of parallel matrix multiplication.…”
Section: Architecturementioning
confidence: 99%
See 2 more Smart Citations
“…FFT algorithms typically perform poorly on general-purpose platforms, because the power-of-two strides of the FFT algorithm interact poorly with set-associative cache, set-associative address translation mechanism, and power-of-banked memory subsystem [1]. FFTW, developed by M. Frigo et al, is known as the fastest software implementation of the FFT algorithm.…”
Section: Introductionmentioning
confidence: 99%