Architecture Exploration of High-Performance Floating-Point Fused Multiply-Add Units and their Automatic Use in High-Level Synthesis

Liebig, Björn; Huthmann, Jens; Koch, Andreas

doi:10.1109/ipdpsw.2013.106

Cited by 4 publications

(1 citation statement)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Although this architecture [16] can remove the carry propagation in the final adder, it requires a (2N+α-1)-bit accumulator. Besides, it is also noteworthy to mention that the concept of this architecture [16] has been used in modern floating-point fused multiply-add (FMA) designs [17], [18].…”

Section: Introductionmentioning

confidence: 99%

A High-Performance Multiply-Accumulate Unit by Integrating Additions and Accumulations Into Partial Product Reduction Process

Tung

Huang

2020

IEEE Access

View full text Add to dashboard Cite

In this paper, we propose a low-power high-speed pipeline multiply-accumulate (MAC) architecture. In a conventional MAC, carry propagations of additions (including additions in multiplications and additions in accumulations) often lead to large power consumption and large path delay. To resolve this problem, we integrate a part of additions into the partial product reduction (PPR) process. In the proposed MAC architecture, the addition and accumulation of higher significance bits are not performed until the PPR process of the next multiplication. To correctly deal with the overflow in the PPR process, a small-size adder is designed to accumulate the total number of carries. Compared with previous works, experimental results show that the proposed MAC architecture can greatly reduce both power consumption and circuit area under the same timing constraint.INDEX TERMS Digital circuits, logic circuits, multiplying circuits, pipeline processing, power dissipation.

show abstract

Section: Introductionmentioning

confidence: 99%

A High-Performance Multiply-Accumulate Unit by Integrating Additions and Accumulations Into Partial Product Reduction Process

Tung

Huang

2020

IEEE Access

View full text Add to dashboard Cite

show abstract

Design of MAC unit for digital filters in signal processing and communication

Harish

Rukmini

Sivani

2021

Int J Speech Technol

View full text Add to dashboard Cite

Efficient implementation of a single-precision floating-point arithmetic unit on FPGA

Jose

Silva

Neto

et al. 2014

2014 24th International Conference on Field Programmable Logic and Applications (FPL)

View full text Add to dashboard Cite

This paper presents a single precision floating point arithmetic unit with support for multiplication, addition, fused multiply-add, reciprocal, square-root and inverse squareroot with high-performance and low resource usage. The design uses a piecewise 2 nd order polynomial approximation to implement reciprocal, square-root and inverse square-root. The unit can be configured with any number of operations and is capable to calculate any function with a throughput of one operation per cycle. The floatingpoint multiplier of the unit is also used to implement the polynomial approximation and the fused multiply-add operation. We have compared our implementation with other state-of-the-art proposals, including the Xilinx CoreGen operators, and conclude that the approach has a high relative performance/area efficiency.

show abstract

Architecture Exploration of High-Performance Floating-Point Fused Multiply-Add Units and their Automatic Use in High-Level Synthesis

Cited by 4 publications

References 20 publications

A High-Performance Multiply-Accumulate Unit by Integrating Additions and Accumulations Into Partial Product Reduction Process

A High-Performance Multiply-Accumulate Unit by Integrating Additions and Accumulations Into Partial Product Reduction Process

Design of MAC unit for digital filters in signal processing and communication

Efficient implementation of a single-precision floating-point arithmetic unit on FPGA

Contact Info

Product

Resources

About