22nd International Conference on Field Programmable Logic and Applications (FPL) 2012
DOI: 10.1109/fpl.2012.6339142
|View full text |Cite
|
Sign up to set email alerts
|

Enhancing performance of Tall-Skinny QR factorization using FPGAs

Abstract: Communication-avoiding linear algebra algorithms with low communication latency and high memory bandwidth requirements like Tall-Skinny QR factorization (TSQR) are highly appropriate for acceleration using FPGAs. TSQR parallelizes QR factorization of tall-skinny matrices in a divideand-conquer fashion by decomposing them into sub-matrices, performing local QR factorizations and then merging the intermediate results. As TSQR is a dense linear algebra problem, one would therefore imagine GPU to show better perfo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
11
0

Year Published

2013
2013
2019
2019

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 14 publications
(11 citation statements)
references
References 7 publications
0
11
0
Order By: Relevance
“…If data is kept inside the device or a data reuse scheme can be devised (e.g. [33]), this benefits the FPGA.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…If data is kept inside the device or a data reuse scheme can be devised (e.g. [33]), this benefits the FPGA.…”
Section: Discussionmentioning
confidence: 99%
“…This can be a disadvantage for memory-bound likelihood computations. Nevertheless, FPGAs enjoy massive on-chip memory bandwidth (20-40 TB/sec [33]) due to large amounts of built-in memory. GPU on-chip memory bandwidths are limited to 8 TB/sec and 1.5 TB/sec [33].…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…The least squares method (LS) or total least squares (TLS) [14] can be applied to find the frequencies of multiple incident sources. The proposed methods employ the LU factorization and the TLS to estimate the unknown frequency ω i similar to the ESPRIT algorithm using the following steps:…”
Section: Stage 1: Frequency Estimationmentioning
confidence: 99%
“…Previous FPGA-based implementations have looked at SVD [Brent and Luk (1982)], QRD [Wang and Leeser (2009)] and sparse LUD [Kapre and DeHon (2009)]. However, those approaches all have some limitations in common: either restricted with the scalability of the adapted matrices due to the logic capacity of FPGAs [Brent and Luk (1982); Ahmedsaid et al (2003); Ma et al (2006); Ledesma-Carrillo et al (2011); Wang and Leeser (2009)] or required the input matrices of special property or irregular sparsity structure [Rafique et al (2012);Tai et al (2011); Vachranukunkiet (2007); Kapre and DeHon (2009); Wu et al (2012)].…”
Section: Contributions: Fpga-based Accelerators For Matrix Decompositmentioning
confidence: 99%