“…Since, no redundant computation is available in lifting 2-D DWT, there is no scope to reduce multiplier complexity without compromising on the throughput rate. However, we observe that many multipliers of block-lifting 2-D DWT structures [12,13,15,17] share a common input operand. A group of multipliers with a common multiplying operand can select their partial product terms from a common set using Booth encoding scheme.…”
In this paper, we present a regular partial product array (PPA) for radix-8 Booth multiplication by removing the extra row with a small overhead complexity. A radix-8 multiplier design is proposed based on the regular PPA which offers a saving of 10.7 % area-delay product (ADP) over the existing radix-8 multiplier design. The n lower-order bits of 2n bit output of full-width multiplier are truncated to have a fixed-width multiplier with low truncation error, where n is the operand bit-width. Few redundant logic operations are created in the adder unit when n lower-order bits of 2n-bit multiplier output are truncated. A specific design is necessary as the modern synthesis tools partially remove these redundant logics. We present an optimized adder unit design after removing redundant logic for post-truncated fixed-width radix-8 Booth multiplier. Comparison result shows that the proposed post-truncated fixedwidth multiplier design offers nearly 20.7 % ADP and 18.3 % power saving over the existing radix-8 design optimized by the Synopsys Design Compiler when 2n-bit output is post-truncated to n-bit. More often, multipliers are used for multiplication of constant. The value of the constant may be fixed or could be changed during runtime by the user. The multiplier that multiplies fixed constant is referred to fixedconstant multiplier and that multiplies constant which changes during run-time is referred to generic-constant multiplier. Both radix-4 and radix-8 Booth multiplier designs easily can be configured for a generic-constant multiplier. However, radix-8 multiplier design offers to save some area and delay when configured for constant multiplication, while the radix-4 multiplier design does not have this feature. We find B Abhishek Choubey Circuits Syst Signal Process that the proposed 12-bit full-width and fixed-width radix-8 generic-constant multiplier designs, respectively, involve 19.4 and 24.7 % less ADP than the existing radix-4 fullwidth and post-truncated multiplier designs configured for constant multiplication. The existing block-based lifting 2-D DWT structure is synthesized using the proposed radix-8 generic-constant fixed-width multiplier design to demonstrate the effectiveness of proposed multiplier designs. We find that the existing lifting 2-D DWT structure of block size 16 and word length 12 offers 19.3 % ADP saving and 11.5 % power saving when the constant multipliers are implemented using the proposed radix-8 multiplier design instead of the existing radix-4 multiplier design.
“…Since, no redundant computation is available in lifting 2-D DWT, there is no scope to reduce multiplier complexity without compromising on the throughput rate. However, we observe that many multipliers of block-lifting 2-D DWT structures [12,13,15,17] share a common input operand. A group of multipliers with a common multiplying operand can select their partial product terms from a common set using Booth encoding scheme.…”
In this paper, we present a regular partial product array (PPA) for radix-8 Booth multiplication by removing the extra row with a small overhead complexity. A radix-8 multiplier design is proposed based on the regular PPA which offers a saving of 10.7 % area-delay product (ADP) over the existing radix-8 multiplier design. The n lower-order bits of 2n bit output of full-width multiplier are truncated to have a fixed-width multiplier with low truncation error, where n is the operand bit-width. Few redundant logic operations are created in the adder unit when n lower-order bits of 2n-bit multiplier output are truncated. A specific design is necessary as the modern synthesis tools partially remove these redundant logics. We present an optimized adder unit design after removing redundant logic for post-truncated fixed-width radix-8 Booth multiplier. Comparison result shows that the proposed post-truncated fixedwidth multiplier design offers nearly 20.7 % ADP and 18.3 % power saving over the existing radix-8 design optimized by the Synopsys Design Compiler when 2n-bit output is post-truncated to n-bit. More often, multipliers are used for multiplication of constant. The value of the constant may be fixed or could be changed during runtime by the user. The multiplier that multiplies fixed constant is referred to fixedconstant multiplier and that multiplies constant which changes during run-time is referred to generic-constant multiplier. Both radix-4 and radix-8 Booth multiplier designs easily can be configured for a generic-constant multiplier. However, radix-8 multiplier design offers to save some area and delay when configured for constant multiplication, while the radix-4 multiplier design does not have this feature. We find B Abhishek Choubey Circuits Syst Signal Process that the proposed 12-bit full-width and fixed-width radix-8 generic-constant multiplier designs, respectively, involve 19.4 and 24.7 % less ADP than the existing radix-4 fullwidth and post-truncated multiplier designs configured for constant multiplication. The existing block-based lifting 2-D DWT structure is synthesized using the proposed radix-8 generic-constant fixed-width multiplier design to demonstrate the effectiveness of proposed multiplier designs. We find that the existing lifting 2-D DWT structure of block size 16 and word length 12 offers 19.3 % ADP saving and 11.5 % power saving when the constant multipliers are implemented using the proposed radix-8 multiplier design instead of the existing radix-4 multiplier design.
“…Unlike RPA-based designs, folded design involves simple control circuitry and it has 100 % HUE. Keeping this in view, several architectures based have been proposed for efficient implementation of lifting 2-D DWT [5][6][7][8][9][10][11][12]. Most of the designs differ by their number of arithmetic components, on-chip memory, cycle period and throughput rate.…”
In this paper we have proposed a look-up-table (LUT) based structure for high-throughput implementation of multilevel lifting DWT. The proposed structure can process one block of samples to achieve high-throughput rate. Compared with the best of the similar existing structure, it does not involves any multipliers but it requires more adders and 21504 extra ROM words for J=3; its offers less critical path delay as compared to exiting structure. Synthesis results show that proposed structure has less ADP 56% less area and 13% less power compared to existing structure for block size J=2. Similarly proposed structure has 64% ADP and less power 21% as compared to existing structure for J=3. The proposed structure is fully scalable for higher block-sizes and it can offer flexibility to derive area-delay efficient structures for various applications.
“…The existing DWT architectures can be classified into two categories, namely convolution-based and lifting-based [5]. Compared with the convolution-based architecture, the lifting-based architecture has several advantages with respect to energy efficiency, such as lower computation complexity and memory-efficient in-place computation [4].…”
Section: A Lifting Schemementioning
confidence: 99%
“…The line-based architectures [5]- [8] read the image in line-by-line order. A high-throughput line-based architecture for multi-level DWT is proposed in [5], with a transposition memory of length 2.5M and a temporal memory of length 3M for an image of size MN.…”
State-of-the-art DWT designs focus on improving hardware utilization and memory efficiency of DWT. In this paper, we consider energy efficiency as the key performance metric. Memory (external memory and on-chip memory) energy dominates the total energy consumption. We propose a DWT architecture with an overlapped block-based image scanning method that optimizes the number of external memory accesses and the on-chip memory size. Using the overlapped block-based scanning method, the required number of external memory accesses of the proposed architecture is reduced by up to 50% when compared with state-of-the-art designs. Besides, the on-chip memory size is also reduced. We implement the proposed architecture on a state-of-the-art FPGA for various image sizes. Our design sustains up to 80.2% of the peak energy efficiency of the device. Compared with the state-of-the-art design, the proposed architecture achieves up to 58.1% energy efficiency improvement.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.