Technology Design and ManufacturingThis article proposes an effective way of implementing a multiply accumulate circuit (MAC) for high-speed floating point arithmetic operations. The real-world applications related to digital signal processing and the like demand high-performance computation with greater accuracy. In general, digital signals are represented as a sequence of signed/unsigned fixed/floating point numbers. The final result of a MAC operation can be computed by feeding the mantissa of the previous MAC result as one of the partial products to a Wallace tree multiplier or Braun multiplier. Thus, the separate accumulation circuit can be avoided by keeping the circuit depth still within the bounds of the Wallace tree multiplier, namely O(log 2 n), or Braun multiplier, namely O(n). In this article, three kinds of floating point MACs are proposed. The experimental results show 48.54% of improvement in worst path delay achieved by the proposed floating point MAC using a radix-2 Wallace structure compared with a conventional floating point MAC without a pipeline using a 45nm technology library. The same proposed design gives 39.92% of improvement in worst path delay without a pipeline using a radix-4 Braun structure as compared with a conventional design. In this article, a radix-32 Q 32.32 -formatbased floating point MAC is proposed using a Wallace tree/Braun multiplier. Also this article discusses the msb prediction problem and its solution in floating point arithmetic that is not available in modern fused multiply-add designs. The performance results show comparisons between the proposed floating point MAC with various floating point MAC designs for radix-2,-4,-8, and -16. The proposed design has lesser depth than a conventional floating point MAC as well as a lower area requirement than other ways of floating point MAC implementation, both with/without a pipeline.
ACM Reference Format:Mohamed Asan Basiri M and Noor Mahammad Sk. 2014. An efficient hardware-based higher radix floating point MAC design.