DOI: 10.1007/978-3-540-78610-8_10
|View full text |Cite
|
Sign up to set email alerts
|

A High Throughput FPGA-Based Floating Point Conjugate Gradient Implementation

Abstract: Recent developments in the capacity of modern Field Programmable Gate Arrays (FPGAs) have significantly expanded their applications. One such field is the acceleration of scientific computation and one type of calculation that is commonplace in scientific computation is the solution of systems of linear equations. A method that has proven in software to be very efficient and robust for finding such solutions is the Conjugate Gradient (CG) algorithm. In this paper we present a widely-parallel and deeply-pipelin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
18
0

Publication Types

Select...
3
2
1

Relationship

2
4

Authors

Journals

citations
Cited by 26 publications
(18 citation statements)
references
References 22 publications
0
18
0
Order By: Relevance
“…However, the benefits can only be attained by using factorization-based methods for solving linear systems, since the factorization is only computed once for both systems. Previous work [18], [19] suggests that iterative methods might be preferable in an FPGA implementation, due to the small number of division operations, which are very expensive in hardware, and because they allow one to trade off accuracy for computation time. In addition, these methods are easy to parallelize since they mostly consist of large matrix-vector multiplications.…”
Section: Algorithm Choicementioning
confidence: 99%
See 2 more Smart Citations
“…However, the benefits can only be attained by using factorization-based methods for solving linear systems, since the factorization is only computed once for both systems. Previous work [18], [19] suggests that iterative methods might be preferable in an FPGA implementation, due to the small number of division operations, which are very expensive in hardware, and because they allow one to trade off accuracy for computation time. In addition, these methods are easy to parallelize since they mostly consist of large matrix-vector multiplications.…”
Section: Algorithm Choicementioning
confidence: 99%
“…The overall latency of the circuit will be given by Latency = 2I NW (P N + P )I MR FPGA f req seconds, (18) where I NW is the number of outer iterations in the interior-point method (Algorithm 1), FPGA f req is the FPGA's clock frequency, and P is given by (15). In that time the controller will be able to output the result to 2P problems.…”
Section: Latency and Throughputmentioning
confidence: 99%
See 1 more Smart Citation
“…We can accelerate scientific computations like the solution to systems of linear equations [2], [3] and [4] due to ever-increasing capacity of modern FPGAs in terms of floating point units and on-chip memory. The symmetric extremal eigenvalue problem is an important scientific computation involving dense linear algebra where one is interested in finding only the extremal eigenvalues of a n × n symmetric matrix.…”
Section: Introductionmentioning
confidence: 99%
“…end for 14: end function CG implementations for dense systems [4] are easier to pipeline and achieve considerably higher memory efficiency due to the regular memory access pattern. On the other hand, sparse CG (where the system matrix A is sparse) is harder to parallelise due to the irregular access pattern and the mixture of both sparse and dense operations.…”
mentioning
confidence: 99%