2005 IEEE International Symposium on Circuits and Systems
DOI: 10.1109/iscas.2005.1465341
|View full text |Cite
|
Sign up to set email alerts
|

Design Trade-Offs in Floating-Point Unit Implementation for Embedded and Processing-In-Memory Systems

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
5
0

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 13 publications
(6 citation statements)
references
References 7 publications
1
5
0
Order By: Relevance
“…This work focussed on the improvements of the logarithmic arithmetic unit design in [11] by enhancing the two important procedures for the LNS addition and subtraction: interpolation and cotransformation. The new arrangement of the two procedures would be able to improve the addition and subtraction operation in LNS, which could represent a whole new logarithmic arithmetic unit architecture.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…This work focussed on the improvements of the logarithmic arithmetic unit design in [11] by enhancing the two important procedures for the LNS addition and subtraction: interpolation and cotransformation. The new arrangement of the two procedures would be able to improve the addition and subtraction operation in LNS, which could represent a whole new logarithmic arithmetic unit architecture.…”
Section: Methodsmentioning
confidence: 99%
“…The wide range of LIP implementation listed in [4]- [7] show the positive impact of logarithmic usage for image related implementation in DSP. Surprisingly, the operational speed for LNS could outperformed the FLP system for up to 50% [8]- [11].…”
Section: Introductionmentioning
confidence: 96%
“…Even though the Taylor-series expansion algorithm with powering units exhibits the highest performance among multiplicative algorithms, it consumes a larger area because the architecture consists of four multipliers, which is not suitable for area-critical applications. In earlier work, we presented a fused floating-point multiplydivide unit based on Taylor-series expansion with powering units where all multiply operations are executed by one multiplier to maximize the area efficiency, while achieving high performance by using a pipelined architecture [5] [6]. By sharing the 2-stage pipelined multiplier among the multiply operations in the algorithm, the latency becomes longer (12 clock cycles) than the direct implementation of the original algorithm (8 clock cycles).…”
Section: Proposed Fp-mul/div/sqrt Fused Unitmentioning
confidence: 99%
“…Even though the Taylor-series expansion algorithm with powering units exhibits the highest performance among multiplicative algorithms, it consumes a larger area because the architecture consists of four multipliers, which is not suitable for area-critical applications. In earlier work, we presented a fused floating-point multiplydivide unit based on a Taylor-series expansion algorithm with powering units where all multiply operations are executed by one multiplier to maximize the area efficiency, while achieving high performance by using a pipelined architecture [5] [6]. By sharing the 2-stage pipelined multiplier among the multiply operations in the algorithm, the latency becomes longer (12 clock cycles) than the direct implementation of the original algorithm (8 clock cycles).…”
Section: Division / Square Root Algorithmsmentioning
confidence: 99%