QR decomposition using FPGAs

Parker, M.G.; Mauer, Volker; Pritsker, Dan

doi:10.1109/naecon.2016.7856841

Cited by 14 publications

(4 citation statements)

References 3 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A new loop structure of MGS algorithm was proposed by Langhammer and Pasca 27 and implemented on Intel Arria 10, the sustained‐to‐peak performance achieved was approximately 100%. It is expected that smaller size QRD would have higher speed; however, in Langhammer and Pasca, 27 bigger size QRD of 32 × 32 with 62.6 GFlops showed higher throughput than QRD in 37 which is 32 × 8 in size with 13.9 GFlops. This means algorithm structure affects synthesized hardware's performance.…”

Section: Previous Workmentioning

confidence: 99%

Efficient hardware‐accelerated pseudoinverse computation through algorithm restructuring for parallelization in high‐level synthesis

Tan

Ooi

Choo

et al. 2021

Circuit Theory & Apps

View full text Add to dashboard Cite

This paper describes a fast and efficient hardware-accelerated pseudoinverse computation through algorithm restructuring and leveraging FPGA synthesis directives for parallelism prior to high-level synthesis (HLS). The algorithm, which is composed of modified Gram-Schmidt QR decomposition (MGS-QRD), triangular matrix inversion (TMI), and matrix multiplication (MM), is synthesized and implemented on a field-programmable gate array (FPGA). MGS-QRD is restructured and augmented with parallelism directives prior to synthesizing the algorithm, which yielded an MGS-QRD hardware accelerator with high throughput. Modifications to the current TMI algorithm were also proposed, in which the removal of redundant computational tasks was done in order to speed up overall operation. Data dependencies in the MM algorithm were carefully considered such that appropriate parallelism directives were inserted, and matching the data flow of MM with MGS-QRD and TMI modules was also performed to accelerate the pseudoinverse computation. The results showed that the proposed pseudoinverse module is better than the naïve implementation which is composed of existing MGS-QRD, TMI and a standard MM in terms of maximum frequency (1.24Â speedup), hardware resources (48% of reduction of DSP usage), latency (23% reduction), and throughput (62% increase).

show abstract

Section: Previous Workmentioning

confidence: 99%

Efficient hardware‐accelerated pseudoinverse computation through algorithm restructuring for parallelization in high‐level synthesis

Tan

Ooi

Choo

et al. 2021

Circuit Theory & Apps

View full text Add to dashboard Cite

show abstract

“…A matrix A can be decomposed into the product of a Q orthogonal matrix and an R upper triangular matrix, where: A = QR, Fig. 2 illustrates the part of Q matrix computation from GS algorithm [10].…”

Section: Gram Schmidt Algorithmmentioning

confidence: 99%

Multi core processor for QR decomposition based on FPGA

Omran

Abdul-abbas

2018

IJET

View full text Add to dashboard Cite

Hardware design of multicore 32-bits processor is implemented to achieve low latency and high throughput QR decomposition (QRD) based on two algorithms which they are Gram Schmidt (GS) and Givens Rotation (GR). The orthogonal matrices are computed using the first core processor by Gram Schmidt algorithm, and the upper triangular matrices are computed using the second core processor by Givens Rotation algorithm. This design of multicore processor can achieve 50M QRD/s throughput for (4 × 4) matrices at running frequency 200 MHz.

show abstract

“…LU decomposition is also likely unsuitable for small matrices, and most works restrict their solution to nonsingular matrices to avoid costly pivoting. In this thesis, we adapted the work from Parker, Mauer and Pritsker (2016) to include a heterogeneous solution with OpenCL working in double precision. We published those results in "Exploration of FPGA-Based Hardware Designs for QR Decomposition for Solving Stiff ODE Numerical Methods Using the HARP Hybrid Architecture" (JUNIOR et al, 2020).…”

Section: Related Workmentioning

confidence: 99%

“…Iterative Direct Double Memory type Codesign Jacobi (Souza, 2017) x x Local/Global x QR (Souza, 2017) x x Local/global x LU (Kapre 2009) x x Local LU (Daga, 2004) x x Local LU (Zhuo, 2006) x x Local LU (Wu, 2011) x Local/Global x Jacobi (Ruan, 2013) x Local/Global x QR (Parker, 2016) x Local QR (Langhammer, 2018) x LU (Ge, 2017) x Local/Global x Cholesky (Liu, 2017) x x Local/Global Gauss-Jordan (Jiang, 2017) x Gauss-Jordan (Meng, 2022) x Local/Global Truncated Spike (Macintosh, 2019) x Local/Global…”

Section: Related Workmentioning

confidence: 99%

Applying Rosenbrock method for solving stiff ODEs raised from the chemical reactivity of the atmosphere through heterogeneous architectures based on FPGAs

Souza Junior

View full text Add to dashboard Cite

Este trabalho foca na resolução de equações diferenciais ordinárias do tipo stiff através de métodos numéricos e com aplicação das técnicas de coprojeto de hardware/software. Estudos Anteriores mostraram que equações stiff requerem métodos implícitos para evitar passos muito curtos dos métodos explícitos. O problema é que estes métodos são baseados em conversões de sistemas não lineares para sistemas lineares, ou seja, é necessário resolver operações matriciais Ax = b. Durante o mestrado ficou claro que os sistemas lineares do CCATT-BRAMS exigem métodos diretos. No CCATT-BRAMS, isso é resolvido via método Rosenbrock que possui quatro estágios (somente o primeiro exige decomposição de matriz). Assim, é possível reaproveitar a decomposição para os próximos estágios do algoritmo para a resolução equações diferenciais ordinárias. O algoritmo de Rosenbrock foi dividido em duas partes, onde a primeira está relacionada com a resolução de sistemas lineares através de métodos diretos e a segunda com a modificação do Rosenbrock para aproveitar a arquitetura de FPGAs. Nossa revisão sistemática mostrou que há bem poucos trabalhos na literatura que exploram o paralelismo de equações diferenciais ordinárias em problemas de reatividade química para FPGAs. Nesta tese, provemos soluções para FPGA utilizando o Intel HLS OpenCL. Nossos resultados demonstram que a arquitetura de hardware gerada para o problema do CCATT-BRAMS é competitiva e que possui potencial para melhorar o desempenho e eficiência energética dessa aplicação tão importante para a previsão meterológica do Brasil.

show abstract

QR decomposition using FPGAs

Cited by 14 publications

References 3 publications

Efficient hardware‐accelerated pseudoinverse computation through algorithm restructuring for parallelization in high‐level synthesis

Efficient hardware‐accelerated pseudoinverse computation through algorithm restructuring for parallelization in high‐level synthesis

Multi core processor for QR decomposition based on FPGA

Applying Rosenbrock method for solving stiff ODEs raised from the chemical reactivity of the atmosphere through heterogeneous architectures based on FPGAs

Contact Info

Product

Resources

About