Abstract-The floating-point unit is one of the most common building block in any computing system and is used for a huge number of applications. By combining two state-of-the-art techniques of imprecise hardware, namely Gate-Level Pruning and Inexact Speculative Adder, and by introducing a novel Inexact Speculative Multiplier architecture, three different approximate FPUs and one reference IEEE-754 compliant FPU have been integrated in a 65 nm CMOS process within a low-power multi-core processor. Silicon measurements show up to 27 % power, 36 % area and 53 % power-area product savings compared to the IEEE-754 single-precision FPU. Accuracy loss has been evaluated with a high-dynamic-range image tone-mapping algorithm, resulting in small but non-visible errors with image PSNR value of 90 dB.
I. INTRODUCTIONWith the forecasted end of Moore's law and the increasing complexity to design and fabricate integrated circuits, power and reliability have become the main challenges to technology scaling. Power has definitely emerged as a critical issue due to the poor scaling of V DD and V th , while transistor miniaturization reaching the nanoscopic scale has led to extreme Process-Voltage-Temperature (PVT) variations. Unfortunately, achieving low power and robustness against PVT variations requires complicated and conflicting design constraints. As a consequence, designers are being pushed to seek for new energy-efficient circuit design and computing techniques to meet the exploding demand of data processing from mobile devices and cloud services.Approximate computing [1, 2] has emerged as a promising solution to sustain computing advancement and overcome the limitations in technology scaling. This approach explores a new trade-off between energy or circuit costs versus application accuracy. A myriad of applications could tolerate trading off a little bit of accuracy without compromising their functionality or user experience. In multimedia applications for instance, a small proportion of errors remains imperceptible to humans.To design approximate systems, several approaches have been investigated at different hardware levels, such as voltagefrequency over-scaling [3] at physical level or significancebased memory protection [4] at algorithmic level. At circuit level, an interesting approach is to perform computations using approximate arithmetic operators, such as adders and multipliers, allowing a controlled and limited amount of errors against significant power saving or performance increase. This paper focuses on two of these techniques: Gate-level Pruning [5] and Inexact Speculative Adder [6], which have both demonstrated significant savings simultaneously in energy, delay and area at the cost of reasonable errors.