Floating-point division and square root using a Taylor-series expansion algorithm

Kwon, Taek-Jun; Draper, J.

doi:10.1016/j.mejo.2009.03.004

Cited by 40 publications

(26 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A fused°oating-point multiply/divide/square root unit based on Taylor-series expansion algorithm was introduced in Ref. 17. This unit can achieve faster computation speed yet with more area consuming.…”

Section: Related Workmentioning

confidence: 99%

An Area-Efficient Unified Architecture for Multi-Functional Double-Precision Floating-Point Computation

Guo

Cui

et al. 2015

J CIRCUIT SYST COMP

View full text Add to dashboard Cite

In this paper, we propose a uni¯ed architecture for computation of double-precision°oatingpoint division, reciprocal, square root, inverse square root and multiplication with a signi¯cant area reduction. First, a double-precision multiplication-based divider, the common datapath shared with these arithmetic computations, is optimized by a modi¯ed Goldschmidt algorithm to achieve better area e±ciency. In this algorithm, a linear-degree minimax approximation instead of second-degree is used to obtain a 15-bit precision estimate of the reciprocal so that we can get a rather small lookup table (LUT) as well as reduced amount of computation when accumulating the partial products. Two Goldschmidt iterations specially designed for hardware reuse are performed to gain the¯nal accurate result of division. By virtue of the pipelined processing, the time cost for the two iterations is minimized. Second, a recon¯gurable datapath with a little extra area cost is introduced to dynamically support multiple double-precision computations by executing the optimized divider iteratively. The design is¯nally implemented and synthesized in SMIC 0.13-m CMOS process. The experimental results show that the proposed design can achieve a speed of 400 MHz with area of 61.6 K logic gates and 9-Kb LUT. Compared with other works, the area e±ciency (performance/area ratio) of the proposed uni¯ed architecture is increased by about 20% in average, which is a better performance-area trade-o® for embedded microprocessors.

show abstract

Section: Related Workmentioning

confidence: 99%

An Area-Efficient Unified Architecture for Multi-Functional Double-Precision Floating-Point Computation

Guo

Cui

et al. 2015

J CIRCUIT SYST COMP

View full text Add to dashboard Cite

show abstract

“…Considering the average lower frequency of use of division and square root [6], the considerable silicon resources, and disparately long latencies both require, neither is found in every strip. In the case of SQRT, a software implementation of Heron's method overloads any calls made.…”

Section: A Many-core Pementioning

confidence: 99%

Towards a many-core architecture for HPC

Wyngaard¹,

Inggs²,

Collins

et al. 2013

2013 23rd International Conference on Field Programmable Logic and Applications

View full text Add to dashboard Cite

Many-core architectures are a current avenue of research, seeking alternative higher efficiency computing, and HPC is one domain which may benefit most from such a model. While at an initial prototyping stage we present here the design of a MIMD many-core processor, Fynbos and, considering the problems of programmability, an autoparallelising Fortran pipeline. Our initial operating results demonstrate functionality, and the effectiveness of the compiler as the system efficiency increases with problem size in a test case multi-body simulation. The test case also serves to highlight system weaknesses. We conclude that the demonstration offers sufficient motivation for the future work discussed.

show abstract

“…The floating point square root implementation on hardware proves to be accurate but it also occupies greater amount of hardware resources than compared to the pipelined hardware implementation of the modified non-restoring square root algorithm [1][2]5].…”

Section: Introductionmentioning

confidence: 99%

“…However, the most complex arithmetic operation has been the square root operation due to its dependence on complex approximation algorithms [2][3][4][6][7]. Several algorithms for calculating square root has been developed and implemented on FPGAs [4].…”

Section: Introductionmentioning

confidence: 99%

New efficient hardware design methodology for modified non-restoring square root algorithm

Rahman¹,

Abdullah-Al-Kafi²

2014

2014 International Conference on Informatics, Electronics &Amp; Vision (ICIEV)

View full text Add to dashboard Cite

This paper shows a new methodology to design the hardware for computing square root of N-bit unsigned numbers. The proposed hardware design is based on the modified nonrestoring square root algorithm. Two different hardware designs, sequential pipeline architecture and asynchronous architecture for computing N-bit fixed point square root operation are proposed. The synthesis report of the designed FPGA based pipelined hardware for 32-bit square root operation shows that the usage of the logical resources of FPGA is significantly less than that of the earlier proposed pipelined hardware designs based on modified non-restoring algorithm. Moreover, the proposed pipelined hardware design can be configured to calculate square root of 32-bit number in 16 and 8 clock cycles. The maximum frequency achieved for the operation latency of 16-clock cycles for computing 32-bit unsigned square root is 403.770 MHz. The maximum frequency achieved for operating latency of 8-clock cycles is 260.233 MHz. On the other side, proposed asynchronous architecture based FPGA hardware design supersedes the earlier proposed asynchronous hardware designs for N-bit square root operation in terms of the less usage of hardware resources. Both the pipelined and asynchronous hardware designs are tested on Xilinx Virtex 7 XC7VX980T-2, Virtex 5 XC5VLX330T-2 and Spartan 3E XC3S1600E-5 FPGAs.

show abstract

Floating-point division and square root using a Taylor-series expansion algorithm

Cited by 40 publications

References 7 publications

An Area-Efficient Unified Architecture for Multi-Functional Double-Precision Floating-Point Computation

An Area-Efficient Unified Architecture for Multi-Functional Double-Precision Floating-Point Computation

Towards a many-core architecture for HPC

New efficient hardware design methodology for modified non-restoring square root algorithm

Contact Info

Product

Resources

About