A High Throughput FPGA-Based Floating Point Conjugate Gradient Implementation

Lopes, Antonio Roldao; Constantinides, George A.

doi:10.1007/978-3-540-78610-8_10

Cited by 26 publications

(18 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, the benefits can only be attained by using factorization-based methods for solving linear systems, since the factorization is only computed once for both systems. Previous work [18], [19] suggests that iterative methods might be preferable in an FPGA implementation, due to the small number of division operations, which are very expensive in hardware, and because they allow one to trade off accuracy for computation time. In addition, these methods are easy to parallelize since they mostly consist of large matrix-vector multiplications.…”

Section: Algorithm Choicementioning

confidence: 99%

“…The overall latency of the circuit will be given by Latency = 2I NW (P N + P )I MR FPGA f req seconds, (18) where I NW is the number of outer iterations in the interior-point method (Algorithm 1), FPGA f req is the FPGA's clock frequency, and P is given by (15). In that time the controller will be able to output the result to 2P problems.…”

Section: Latency and Throughputmentioning

confidence: 99%

“…Expression (18) consists of quadratic, linear and constant terms with respect to the number of inputs. If m is small compared to n and T , the constant term dominates and the improvement from using multiplexed MPC diminishes as a consequence.…”

Section: Performance Of Multiplexed Mpcmentioning

confidence: 99%

See 2 more Smart Citations

Model predictive control for deeply pipelined field-programmable gate array implementation: algorithms and circuitry

Jerez

Ling

Constantinides

et al. 2012

IET Control Theory Appl.

View full text Add to dashboard Cite

Model predictive control (MPC) is an optimization-based scheme that imposes a real-time constraint on computing the solution of a quadratic programming (QP) problem. The implementation of MPC in fast embedded systems presents new technological challenges. In this paper we present a parameterized field-programmable gate array (FPGA) implementation of a customized QP solver for optimal control of linear processes with constraints, which can achieve substantial acceleration over a general purpose microprocessor, especially as the size of the optimization problem grows. The focus is on exploiting the structure and accelerating the computational bottleneck in an existing primal-dual interior-point method. We then introduce a new MPC formulation that can take advantage of the novel computational opportunities, in the form of parallel computational channels, offered by the proposed pipelined architecture to improve performance even further. This highlights the importance of the interaction between the control theory and digital system design communities for the success of MPC in fast embedded systems.

show abstract

Section: Algorithm Choicementioning

confidence: 99%

Section: Latency and Throughputmentioning

confidence: 99%

See 1 more Smart Citation

Model predictive control for deeply pipelined field-programmable gate array implementation: algorithms and circuitry

Jerez

Ling

Constantinides

et al. 2012

IET Control Theory Appl.

View full text Add to dashboard Cite

show abstract

“…We can accelerate scientific computations like the solution to systems of linear equations [2], [3] and [4] due to ever-increasing capacity of modern FPGAs in terms of floating point units and on-chip memory. The symmetric extremal eigenvalue problem is an important scientific computation involving dense linear algebra where one is interested in finding only the extremal eigenvalues of a n × n symmetric matrix.…”

Section: Introductionmentioning

confidence: 99%

A High Throughput FPGA-Based Implementation of the Lanczos Method for the Symmetric Extremal Eigenvalue Problem

Rafique

Kapre

Constantinides

2012

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

Abstract. Iterative numerical algorithms with high memory bandwidth requirements but medium-size data sets (matrix size ∼ a few 100s) are highly appropriate for FPGA acceleration. This paper presents a streaming architecture comprising floating-point operators coupled with highbandwidth on-chip memories for the Lanczos method, an iterative algorithm for symmetric eigenvalues computation. We show the Lanczos method can be specialized only for extremal eigenvalues computation and present an architecture which can achieve a sustained single precision floating-point performance of 175 GFLOPs on Virtex6-SX475T for a dense matrix of size 335×335. We perform a quantitative comparison with the parallel implementations of the Lanczos method using optimized Intel MKL and CUBLAS libraries for multi-core and GPU respectively. We find that for a range of matrices the FPGA implementation outperforms both multi-core and GPU; a speed up of 8.2-27.3× (13.4× geo. mean) over an Intel Xeon X5650 and 26.2-116× (52.8× geo. mean) over an Nvidia C2050 when FPGA is solving a single eigenvalue problem whereas a speed up of 41-520× (103× geo.mean) and 131-2220× (408× geo.mean) respectively when it is solving multiple eigenvalue problems.

show abstract

“…end for 14: end function CG implementations for dense systems [4] are easier to pipeline and achieve considerably higher memory efficiency due to the regular memory access pattern. On the other hand, sparse CG (where the system matrix A is sparse) is harder to parallelise due to the irregular access pattern and the mixture of both sparse and dense operations.…”

mentioning

confidence: 99%

An efficient sparse conjugate gradient solver using a Beneš permutation network

Chow

Grigoras

Burovskiy

et al. 2014

2014 24th International Conference on Field Programmable Logic and Applications (FPL)

View full text Add to dashboard Cite

Abstract-The conjugate gradient (CG) is one of the most widely used iterative methods for solving systems of linear equations. However, parallelizing CG for large sparse systems is difficult due to the inherent irregularity in memory access pattern. We propose a novel processor architecture for the sparse conjugate gradient method. The architecture consists of multiple processing elements and memory banks, and is able to compute efficiently both sparse matrix-vector multiplication, and other dense vector operations. A Beneš permutation network with an optimised control scheme is introduced to reduce memory bank conflicts without expensive logic. We describe a heuristics for offline scheduling, the effect of which is captured in a parametric model for estimating the performance of designs generated from our approach.

show abstract

A High Throughput FPGA-Based Floating Point Conjugate Gradient Implementation

Cited by 26 publications

References 22 publications

Model predictive control for deeply pipelined field-programmable gate array implementation: algorithms and circuitry

Model predictive control for deeply pipelined field-programmable gate array implementation: algorithms and circuitry

A High Throughput FPGA-Based Implementation of the Lanczos Method for the Symmetric Extremal Eigenvalue Problem

An efficient sparse conjugate gradient solver using a Beneš permutation network

Contact Info

Product

Resources

About

A High Throughput FPGA-Based Floating Point Conjugate Gradient Implementation

Cited by 26 publications

References 22 publications

Model predictive control for deeply pipelined field-programmable gate array implementation: algorithms and circuitry

Model predictive control for deeply pipelined field-programmable gate array implementation: algorithms and circuitry

A High Throughput FPGA-Based Implementation of the Lanczos Method for the Symmetric Extremal Eigenvalue Problem

An efficient sparse conjugate gradient solver using a Bene&#x0161; permutation network

Contact Info

Product

Resources

About

An efficient sparse conjugate gradient solver using a Beneš permutation network