Fast division algorithm with a small lookup table

Hung, Patrick; Fahmy, Hossam A. H.; Mencer, Oskar; Flynn, Michael J.

doi:10.1109/acssc.1999.831992

Cited by 39 publications

(36 citation statements)

References 3 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Leeser and Wang [20] implement floating-point division with variable precision on a Xilinx Virtex-II FPGA. The division is based on look-up tables and taylor series expansion by Hung et al [21], which uses a 12.5KB look-up table and two multiplications. Regardless of the FPGA, the memory requirement is more than the proposed implementations.…”

Section: Related Workmentioning

confidence: 99%

Efficient Single-Precision Floating-Point Division Using Harmonized Parabolic Synthesis

Savas

Hertz

Nordström

et al. 2017

2017 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

View full text Add to dashboard Cite

Abstract-This paper proposes a novel method for performing division on floating-point numbers represented in IEEE-754 single-precision (binary32) format. The method is based on an inverter, implemented as a combination of Parabolic Synthesis and second-degree interpolation, followed by a multiplier. It is implemented with and without pipeline stages individually and synthesized while targeting a Xilinx Ultrascale FPGA.The implementations show better resource usage and latency results when compared to other implementations based on different methods. In case of throughput, the proposed method outperforms most of the other works, however, some Altera FPGAs achieve higher clock rate due to the differences in the DSP slice multiplier design.Due to the small size, low latency and high throughput, the presented floating-point division unit is suitable for high performance embedded systems and can be integrated into accelerators or be used as a stand-alone accelerator.

show abstract

Section: Related Workmentioning

confidence: 99%

Efficient Single-Precision Floating-Point Division Using Harmonized Parabolic Synthesis

Savas

Hertz

Nordström

et al. 2017

2017 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

View full text Add to dashboard Cite

show abstract

“…Hung [8] and Jeong [9] proposed pipelined division algorithms. These express division with Taylor-series expansions, and then calculate the upper two or four terms with an LUT and multipliers.…”

Section: Related Workmentioning

confidence: 99%

“…This method conducts division with the procedure in Eq. (3) and it reduces the area by 27%, as compared to the algorithm of [8]. Figure 2 shows the block diagram of [9].…”

Section: Related Workmentioning

confidence: 99%

“…(8). The range happens because of the expressional limitation of fixedpoint representation, and the ranges of minimum and maximum values of division results are the same as the range of X in Eq.…”

Section: Boundary Conditions For Error Analysismentioning

confidence: 99%

“…Hung et al stores the first two terms of the Taylor-series in a lookup table (LUT), and executes a division by referencing the LUT in the first step and using a multiplier in the second step [8]. In [9], a cost-effective pipelined division algorithm has been proposed by modifying the Taylor-series and decreasing the LUT size.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Effective Fixed-Point Pipelined Divider for Mobile Rendering Processors

Park

Bae

et al. 2013

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

SUMMARYIn this paper, we proposed that an area-and speedeffective fixed-point pipelined divider be used for reducing the bit-width of a division unit to fit a mobile rendering processor. To decide the bit-width of a division unit, error analysis has been carried out in various ways. As a result, when the original bit-width was 31-bit, the proposed method reduced the bit-width to 24-bit and reduced the area by 42% with a maximum error of 0.00001%.

show abstract

Series Expansion based Efficient Architectures for Double Precision Floating Point Division

Jaiswal

Cheung

Balakrishnan

et al. 2014

Circuits Syst Signal Process

View full text Add to dashboard Cite

Floating-point division is a complex operation among all floating-point arithmetic; it is also an area and performance dominating unit. This paper presents double-precision floating-point division architectures on FPGA platforms. The designs are area-optimized, running at higher clock-speed, with less latency, and are fully pipelined. Proposed architectures are based on the well-known Taylor-series expansion, using relatively smaller amount of hardware in-terms of memory (initial look-up table), multiplier blocks and slices. Two architectures have been presented with various trade-offs amongst area, memory and accuracy. Designs are based on the use of the partial block multipliers (PBM), in order to reduce hardware usage while minimizing the loss of accuracy. All the implementations have been targeted and optimized separately for different Xilinx FPGAs to exploit their specific resources efficiently. Compared to previously reported literature, the proposed architectures require less area, reduced latency, with the advantage of higher performance gain. The accuracy of the designs have been both theoretically analyzed and validated using random test cases.

show abstract

Fast division algorithm with a small lookup table

Cited by 39 publications

References 3 publications

Efficient Single-Precision Floating-Point Division Using Harmonized Parabolic Synthesis

Efficient Single-Precision Floating-Point Division Using Harmonized Parabolic Synthesis

Effective Fixed-Point Pipelined Divider for Mobile Rendering Processors

Series Expansion based Efficient Architectures for Double Precision Floating Point Division

Contact Info

Product

Resources

About