Distributed arithmetic (DA) is an efficient way to implement sum-of-product units, which can be used in applications such as FIR filters, filter banks, image processing, and convolutional neural networks in IoT hardware accelerators. In this paper, a low-power architecture is offered centered on two key steps, coarse-grain power-performance partitioning and fine-grain separated power supply, and tuned a level shifter in any digital systems implemented based on RTL design. To examine the new design approach, two methods are proposed, at first exploring the power consumption and critical delay path and then separating the power supply of datapath and controller in DA with/without utilizing an implicit level shifter. To investigate power consumption and timing parameters, two 5-tap and 16-tap FIR-filters are implemented in 65-nm CMOS standard technology. In proposed1, dynamic and static power consumption of 16-tap (5-tap) FIR-filter is improved by 30% (15%) and 59% (59%), respectively.Nevertheless, the delay and slope of the generated signal in the shiftreg are significantly increased by reducing the power supply. To tackle this issue, the second method suggested with 26% (14%) improvement in dynamic power and 55% (58%) in static power consumption, which does not increase any delay in the output of shiftreg compared to conventional DA-based FIR filter with solitary power supply.distributed arithmetic, FIR filter, inner product, low-power, power-performance partitioning, RTL design
and literature reviewInner product computation is an essential operation in digital signal processing (DSP) applications targeting image, audio, and video processing, as well as various applications of machine learning. 1 It could be used through a common multiply-accumulate (MAC) structure or an improved MAC method called distributed arithmetic (DA) to implement the sum of multiplications. The MAC structure is a high-consumption unit in implementation and not cost-effective in designing high-performance systems due to a large number of multipliers. DA structure is a bit-level rearrangement of