Parallel LU factorization of sparse matrices on FPGA‐based configurable computing engines

Wang, Xiaofang; Ziavras, Sotirios G.

doi:10.1002/cpe.748

Cited by 34 publications

(37 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Figure 1 shows our processor-based system model for the parallel BDB LU factorization algorithm [7]. The binary tree interconnection network matches well the data communication model in our algorithm.…”

Section: Fpga-based Configurable Multiprocessorsmentioning

confidence: 79%

“…The Shared RAM between two neighbors speeds up the system performance by eliminating the transfer of large blocks of data between memories. The sizes of the Shared RAM and Data Memory are determined based on the size of the largest 3-block matrix group that may appear in our algorithm [7] and the total available on-chip memory. We assign just enough space to the Boot ROM, Data Memory and Shared RAM in order to leave as much space as possible for the Program Memory.…”

Section: Scmentioning

confidence: 99%

“…We implemented the floating-point arithmetic and these functions in hardware, and interfaced the application code as custom instructions [7]. Such hardware customization also releases many resources to the processor for other tasks.…”

Section: Scmentioning

confidence: 99%

“…Let us begin with the preprocessing phase where we attempt to order the matrix into an optimal BDB matrix [7]. The best ordering is the one that keeps the 3-block groups as dense as possible while not making the last block too large.…”

Section: Mapping Applications To the Multiprocessormentioning

confidence: 99%

“…[6] presents a recent example. We implemented a scalable shared-memory multiprocessor and mapped a parallel LU factorization algorithm onto an FPGA [7]. With its good performance, our machine shows the great potential of FPGA-based configurable technology for implementing parallel systems.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Performance optimization of an FPGA-based configurable multiprocessor for matrix operations

Wang

Ziavras

Proceedings. 2003 IEEE International Conference on Field-Programmable Technology (FPT) (IEEE Cat. No.03EX798)

View full text Add to dashboard Cite

show abstract

Section: Fpga-based Configurable Multiprocessorsmentioning

confidence: 79%

Section: Scmentioning

confidence: 99%

Section: Scmentioning

confidence: 99%

Section: Mapping Applications To the Multiprocessormentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Performance optimization of an FPGA-based configurable multiprocessor for matrix operations

Wang

Ziavras

Proceedings. 2003 IEEE International Conference on Field-Programmable Technology (FPT) (IEEE Cat. No.03EX798)

View full text Add to dashboard Cite

show abstract

Design space exploration for sparse matrix‐matrix multiplication on FPGAs

Lin

Wong

2011

Circuit Theory & Apps

View full text Add to dashboard Cite

SUMMARYThe design and implementation of a sparse matrix-matrix multiplication architecture on field-programmable gate arrays is presented. Performance of the design, in terms of computational latency, as well as the associated power-delay and energy-delay tradeoff are studied. Taking advantage of the sparsity of the input matrices, the proposed design allows user-tunable power-delay and energy-delay tradeoffs by employing different number of processing elements (PEs) in the architecture design and different block size in the blocking decomposition. Such ability allows designers to employ different on-chip computational architecture for different system power-delay and energy-delay requirements. It is in contrast to conventional dense matrix-matrix multiplication architectures that always favor the maximum number of PEs and largest block size. In our implementation, the better energy consumption and power-delay product favors less PEs and smaller block size for the 90%-sparsity matrix-matrix multiplications. Although in order to achieve better energy-delay product, more PEs and larger block size are preferred.

show abstract

An FPGA-Based Parallel Accelerator for Matrix Multiplications in the Newton-Raphson Method

Ziavras

Chang

2005

Embedded and Ubiquitous Computing – EUC 2005

View full text Add to dashboard Cite

Parallel LU factorization of sparse matrices on FPGA‐based configurable computing engines

Cited by 34 publications

References 31 publications

Performance optimization of an FPGA-based configurable multiprocessor for matrix operations

Performance optimization of an FPGA-based configurable multiprocessor for matrix operations

Design space exploration for sparse matrix‐matrix multiplication on FPGAs

An FPGA-Based Parallel Accelerator for Matrix Multiplications in the Newton-Raphson Method

Contact Info

Product

Resources

About