A fast FPGA based architecture for computation of square root and Inverse Square Root

Hasnat, Abul; Bhattacharyya, Tanima; Dey, Atanu; Halder, Sourav; Bhattacharjee, Debotosh

doi:10.1109/devic.2017.8073975

Cited by 28 publications

(23 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The maximum throughput of this design exceeds the throughput of the designs presented in [7, 11], which used Xilinx Virtex‐5 and Virtex‐6 devices. Additionally, the design presented by Hasnat et al that used magic numbers required 12% of the high‐end Xilinx Virtex‐5 flip‐flop logic.…”

Section: Resultsmentioning

confidence: 91%

See 1 more Smart Citation

Efficient digital implementation of a multi‐precision square‐root algorithm

Beasley

Watson

Clarke

2018

IET Computers & Digital Techniques

View full text Add to dashboard Cite

In high performance computing systems and signal processing, there is a basic set of mathematical functions that are essential. While addition, subtraction and multiplication are well understood, there is less literature on square-rooting, which is a particularly time-and resource-consuming function. Traditional non-restoring algorithms produce a mantissa half the length of the input mantissa, causing a loss of precision. This study presents a method for increasing the accuracy of this algorithm. It is shown to work for all IEEE-754R standard floating-point numbers. Error analysis shows a 57-fold (for half-precision) and 134e6fold improvement (for double-precision) in the normalised error, equivalent to at most 1 Units of Least Precision. Resource and performance optimised variants are analysed and their throughput analysed. On an Intel Stratix V device, performance optimised implementations achieve a throughput of 717 MFLOPs. Resource optimised implementations on a low-cost device require only 127 Adaptive Logic Modules and 232 registers, with a throughput of 8.56 MFLOPs. All implementations are DSP block and memory free, saving valuable resources. The maximum throughput of the presented design is 15.5 times greater than that proposed by Pimentel et al. and two orders of magnitude greater than typical multiply-accumulate methods.

show abstract

Section: Resultsmentioning

confidence: 91%

“…Recent work into fast FPGA architectures for square‐root and inverse square‐roots is presented by Hasnat et al [11]. A methodology is presented that uses seven ‘magic’ numbers, found experimentally.…”

Section: Introductionmentioning

confidence: 99%

Efficient digital implementation of a multi‐precision square‐root algorithm

Beasley

Watson

Clarke

2018

IET Computers & Digital Techniques

View full text Add to dashboard Cite

show abstract

“…Hasnat et al [24] uses 199 LUTs and 24 FFs which are quite low. However, we assume they use 14 DSPs which is above our numbers.…”

Section: Resultsmentioning

confidence: 99%

“…Hasnat et al [24] develop an FPGA implementation that calculates single-precision floating-point square root and inverse square root simultaneously with Quake's algorithm [25] modified using Newton Raphson method. They achieve quite low numbers for resource usage on Virtex 5, however, they do not support pipelining and do not provide the number for DSP usage.…”

Section: Related Workmentioning

confidence: 99%

Using Harmonized Parabolic Synthesis to Implement a Single-Precision Floating-Point Square Root Unit

Savas

Atwa

Nordström

et al. 2019

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

View full text Add to dashboard Cite

This paper proposes a novel method for performing square root operation on floating-point numbers represented in IEEE-754 single-precision (binary32) format. The method is implemented using Harmonized Parabolic Synthesis. It is implemented with and without pipeline stages individually and synthesized for two different Xilinx FPGA boards. The implementations show better resource usage and latency results when compared to other similar works including Xilinx intellectual property (IP) that uses the CORDIC method. Any method calculating the square root will make approximation errors. Unless these errors are distributed evenly around zero, they can accumulate and give a biased result. An attractive feature of the proposed method is the fact that it distributes the errors evenly around zero, in contrast to CORDIC for instance. Due to the small size, low latency, high throughput, and good error properties, the presented floating-point square root unit is suitable for high performance embedded systems. It can be integrated into a processor's floating point unit or be used as a stand-alone accelerator.

show abstract

“…In particular, reducing the number of calls for the square root operations proves to be significant. This is because, in modern processors, the square root operation is not computed directly but through the calculation of the inverse square root [35], which requires calling a division operation at the end of the calculation in order to obtain the sought result [36,37,38]. Therefore, the total impact of the arithmetic reduction in the proposed algorithm is not just the ratio of the counting of arithmetic operations of the Cholesky based approach and the proposed fast algorithm.…”

Section: Discussionmentioning

confidence: 99%

Fast matrix inversion and determinant computation for Polarimetric Synthetic Aperture Radar

Coelho

Cintra

Frery

et al. 2018

Computers & Geosciences

View full text Add to dashboard Cite

This paper introduces a fast algorithm for simultaneous inversion and determinant computation of small sized matrices in the context of fully Polarimetric Synthetic Aperture Radar (PolSAR) image processing and analysis. The proposed fast algorithm is based on the computation of the adjoint matrix and the symmetry of the input matrix. The algorithm is implemented in a general purpose graphical processing unit (GPGPU) and compared to the usual approach based on Cholesky factorization. The assessment with simulated observations and data from an actual PolSAR sensor show a speedup factor of about two when compared to the usual Cholesky factorization. Moreover, the expressions provided here can be implemented in any platform.

show abstract

A fast FPGA based architecture for computation of square root and Inverse Square Root

Cited by 28 publications

References 15 publications

Efficient digital implementation of a multi‐precision square‐root algorithm

Efficient digital implementation of a multi‐precision square‐root algorithm

Using Harmonized Parabolic Synthesis to Implement a Single-Precision Floating-Point Square Root Unit

Fast matrix inversion and determinant computation for Polarimetric Synthetic Aperture Radar

Contact Info

Product

Resources

About