A dual floating point coprocessor with an FMAC architecture

Heikes, C.; Colon-Bonet, Glenn

doi:10.1109/isscc.1996.488714

Cited by 16 publications

(13 citation statements)

References 2 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To be more specific in the descriptions, we consider the IEEE doubleprecision format, but we do not discuss neither special nor denormalized numbers. The necessary steps in the traditional implementation of the MAF unit [7], used in some recent floating-point units of general-purpose processors [5,6,12], are: 3. Normalization and rounding 2 .…”

Section: Floating-point Mafmentioning

confidence: 99%

“…Consequently, considering that the exponent difference for the MAF operation is d = exp(A) − (exp(B) + exp(C)) 6 and taking into account that the multiplication can produce an overflow, the CLOSE datapath is used for effective multiply-subtractions with (1) an exponent difference d = 0, 1, (2) an exponent difference d = 2 and OV F (B × C) = 1, and (3) an exponent difference d = −1 and OV F (B × C) = 0. The FAR datapath is used for the remaining cases.…”

Section: General Structure Of the Pro-posed Mafmentioning

confidence: 99%

“…The floating-point unit of several recent commercial general-purpose processors include as a key feature a unified floating-point multiply-add fused (MAF) unit [5,6,12,14]. This unit executes the single or doubleprecision multiply-add, A+(B×C), as a single instruction, with no intermediate rounding.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Floating-Point Fused Multiply-Add: Reduced Latency for Floating-Point Addition

Bruguera

Lang

17th IEEE Symposium on Computer Arithmetic (ARITH'05)

View full text Add to dashboard Cite

In this paper we propose an architecture for the computation of the double-precision floating-point multiply-add fused (MAF) operation A + (B × C) that permits to compute the floating-point addition with lower latency than floating-point multiplication and MAF. While previous MAF architectures compute the three operations with the same latency, the proposed architecture permits to skip the first pipeline stages, those related with the multiplication B × C, in case of an addition. For instance, for a MAF unit pipelined into three or five stages, the latency of the floating-point addition is reduced to two or three cycles, respectively. To achieve the latency reduction for floating-point addition, the alignment shifter, which in previous organizations is in parallel with the multiplication, is moved so that the multiplication can be bypassed. To avoid that this modification increases the critical path, a double-datapath organization is used, in which the alignment and normalization are in separate paths. Moreover, we use the techniques developed previously of combining the addition and the rounding and of performing the normalization before the addition.

show abstract

Section: Floating-point Mafmentioning

confidence: 99%

Section: General Structure Of the Pro-posed Mafmentioning

confidence: 99%

See 1 more Smart Citation

Floating-Point Fused Multiply-Add: Reduced Latency for Floating-Point Addition

Bruguera

Lang

17th IEEE Symposium on Computer Arithmetic (ARITH'05)

View full text Add to dashboard Cite

show abstract

“…Floating-point (FP) addition is the most frequent FP operation and FP adders are therefore critically important components in modern microprocessors [4,6,7,12,5] and digital signal processors [23]. FP adders must be fast to match the increasing clock rates demanded by deep submicron technologies with a small number of pipelining stages to minimise latency and improve branch resolution time.…”

Section: Introductionmentioning

confidence: 99%

“…They also discuss how to construct faster FP adders. Implementations of FP adders are reported in [6,7,12,5,9,13,10]. Algorithms and circuits which have been used to improve their design are described in [17,8,3,20,16,21,15,22,19].…”

Section: Introductionmentioning

confidence: 99%