2017
DOI: 10.1142/s0129626417500062
|View full text |Cite
|
Sign up to set email alerts
|

Accelerating BLAS and LAPACK via Efficient Floating Point Architecture Design

Abstract: Abstract-Basic Linear Algebra Subprograms (BLAS) and Linear Algebra Package (LAPACK) form basic building blocks for several High Performance Computing (HPC) applications and hence dictate performance of the HPC applications. Performance in such tuned packages is attained through tuning of several algorithmic and architectural parameters such as number of parallel operations in the Directed Acyclic Graph of the BLAS/LAPACK routines, sizes of the memories in the memory hierarchy of the underlying platform, bandw… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
4
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
3
3

Relationship

2
4

Authors

Journals

citations
Cited by 10 publications
(4 citation statements)
references
References 22 publications
0
4
0
Order By: Relevance
“…FPS has several resources to perform computations. In this exposition, we use carefully designed DOT4, a square root, and a divider for realization of DGEQR2, DGEQRF, DGEQR2HT, and DGEQRFHT routines [21] [22]. Logical place of arithmetic units is shown in the figure 12 and structure of DOT4 is shown in figure 13.…”
Section: Custom Realization Of Householder Transform and Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…FPS has several resources to perform computations. In this exposition, we use carefully designed DOT4, a square root, and a divider for realization of DGEQR2, DGEQRF, DGEQR2HT, and DGEQRFHT routines [21] [22]. Logical place of arithmetic units is shown in the figure 12 and structure of DOT4 is shown in figure 13.…”
Section: Custom Realization Of Householder Transform and Resultsmentioning
confidence: 99%
“…Realization of MHT, outperforms realization of DGEMM as shown in figure 14(d). We also show scalability of our solution by attaching PE as a CFU in REDEFINE Due to availability of double precision floating point arithmetic unites like adder, multiplier, square root, and divider, we emphasize on the realization of DGEQR2, and DGEQRF using HT and MHT [21] [22]. Organization of the paper is as follows: In section 2, we briefly discuss about REDEFINE and some of the recent realization of QR factorization.…”
mentioning
confidence: 99%
See 1 more Smart Citation
“…For our experiments, we use Processing Element (PE) design presented in [9]. We optimize Floating Point Unit (FPU) design presented in [13] with recommendations presented in [12] for optimum Instructions Per Cycle (IPC). The paper is organized as follows: In section 2, MFA, KF and REDEFINE are discussed.…”
Section: Introductionmentioning
confidence: 99%