Design Trade-Offs in Floating-Point Unit Implementation for Embedded and Processing-In-Memory Systems

Kwon, Taek-Jun; Sondeen, J.; Draper, J.

doi:10.1109/iscas.2005.1465341

Cited by 13 publications

(6 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This work focussed on the improvements of the logarithmic arithmetic unit design in [11] by enhancing the two important procedures for the LNS addition and subtraction: interpolation and cotransformation. The new arrangement of the two procedures would be able to improve the addition and subtraction operation in LNS, which could represent a whole new logarithmic arithmetic unit architecture.…”

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Less memory and high accuracy logarithmic number system architecture for arithmetic operations

Naziri

Ismail

Isa

et al. 2021

IJEECS

View full text Add to dashboard Cite

<p>Interpolation is another important procedure for logarithmic number system (LNS) addition and subtraction. As a medium of approximation, the interpolation procedure has an urgent need to be enhanced to increase the accuracy of the operation results. Previously, most of the interpolation procedures utilized the first degree interpolators with special error correction procedure which aim to eliminate additional embedded multiplications. However, the interpolation procedure for this research was elevated up to a second degree interpolation. Proper design process, investigation, and analysis were done for these interpolation configurations in positive region by standardizing the same co-transformation procedure, which is the extended range, second order co-transformation. Newton divided differences turned out to be the best interpolator for second degree implementation of LNS addition and subtraction, with the best-achieved BTFP rate of +0.4514 and reduction of memory consumption compared to the same arithmetic used in european logarithmic microprocessor (ELM) up to 51%.</p>

show abstract

Section: Methodsmentioning

confidence: 99%

“…The wide range of LIP implementation listed in [4]- [7] show the positive impact of logarithmic usage for image related implementation in DSP. Surprisingly, the operational speed for LNS could outperformed the FLP system for up to 50% [8]- [11].…”

Section: Introductionmentioning

confidence: 96%

Less memory and high accuracy logarithmic number system architecture for arithmetic operations

Naziri

Ismail

Isa

et al. 2021

IJEECS

View full text Add to dashboard Cite

show abstract

“…Even though the Taylor-series expansion algorithm with powering units exhibits the highest performance among multiplicative algorithms, it consumes a larger area because the architecture consists of four multipliers, which is not suitable for area-critical applications. In earlier work, we presented a fused floating-point multiplydivide unit based on Taylor-series expansion with powering units where all multiply operations are executed by one multiplier to maximize the area efficiency, while achieving high performance by using a pipelined architecture [5] [6]. By sharing the 2-stage pipelined multiplier among the multiply operations in the algorithm, the latency becomes longer (12 clock cycles) than the direct implementation of the original algorithm (8 clock cycles).…”

Section: Proposed Fp-mul/div/sqrt Fused Unitmentioning

confidence: 99%

Floating-point division and square root using a Taylor-series expansion algorithm

Kwon¹,

Draper²

2009

Microelectronics Journal

View full text Add to dashboard Cite

“…Even though the Taylor-series expansion algorithm with powering units exhibits the highest performance among multiplicative algorithms, it consumes a larger area because the architecture consists of four multipliers, which is not suitable for area-critical applications. In earlier work, we presented a fused floating-point multiplydivide unit based on a Taylor-series expansion algorithm with powering units where all multiply operations are executed by one multiplier to maximize the area efficiency, while achieving high performance by using a pipelined architecture [5] [6]. By sharing the 2-stage pipelined multiplier among the multiply operations in the algorithm, the latency becomes longer (12 clock cycles) than the direct implementation of the original algorithm (8 clock cycles).…”

Section: Division / Square Root Algorithmsmentioning

confidence: 99%

Floating-point division and square root implementation using a Taylor-series expansion algorithm

Kwon

Sondeen

Draper

2008

2008 15th IEEE International Conference on Electronics, Circuits and Systems

View full text Add to dashboard Cite

Abstract-Hardware support for floating-point (FP) arithmetic is an essential feature of modern microprocessor design. Although division and square root are relatively infrequent operations in traditional general-purpose applications, they are indispensable and becoming increasingly important in many modern applications. In this paper, a fused floating-point multiply/divide/square root unit based on Taylor-series expansion algorithm is presented. The implementation results of the proposed fused unit based on standard cell methodology in IBM 90nm technology exhibits that the incorporation of square root function to an existing multiply/divide unit requires only a modest 23% area increase and the same low latency for divide and square root operation can be achieved (12 cycles). The proposed arithmetic unit also exhibits a reasonably good area-performance balance.

show abstract

Design Trade-Offs in Floating-Point Unit Implementation for Embedded and Processing-In-Memory Systems

Cited by 13 publications

References 7 publications

Less memory and high accuracy logarithmic number system architecture for arithmetic operations

Less memory and high accuracy logarithmic number system architecture for arithmetic operations

Floating-point division and square root using a Taylor-series expansion algorithm

Floating-point division and square root implementation using a Taylor-series expansion algorithm

Contact Info

Product

Resources

About