2020
DOI: 10.1109/access.2020.2992286
|View full text |Cite
|
Sign up to set email alerts
|

A High-Performance Multiply-Accumulate Unit by Integrating Additions and Accumulations Into Partial Product Reduction Process

Abstract: In this paper, we propose a low-power high-speed pipeline multiply-accumulate (MAC) architecture. In a conventional MAC, carry propagations of additions (including additions in multiplications and additions in accumulations) often lead to large power consumption and large path delay. To resolve this problem, we integrate a part of additions into the partial product reduction (PPR) process. In the proposed MAC architecture, the addition and accumulation of higher significance bits are not performed until the PP… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
20
0
4

Year Published

2020
2020
2025
2025

Publication Types

Select...
8
1

Relationship

2
7

Authors

Journals

citations
Cited by 27 publications
(24 citation statements)
references
References 30 publications
0
20
0
4
Order By: Relevance
“…Note that the core of convolution operation is multiplication and accumulation. Therefore, in the SIMD architecture, multiply-accumulate (MAC) engines [28][29][30] are used to support convolution operations between input activations and kernel weights. No matter if a CNN is sparse or not, the compression format cannot be directly applied to the SIMD architecture; otherwise, irregularly distributed nonzero values will break the alignment of input activations and kernel weights.…”
Section: Related Workmentioning
confidence: 99%
“…Note that the core of convolution operation is multiplication and accumulation. Therefore, in the SIMD architecture, multiply-accumulate (MAC) engines [28][29][30] are used to support convolution operations between input activations and kernel weights. No matter if a CNN is sparse or not, the compression format cannot be directly applied to the SIMD architecture; otherwise, irregularly distributed nonzero values will break the alignment of input activations and kernel weights.…”
Section: Related Workmentioning
confidence: 99%
“…Hardware simulation performed on FPGA Artix xc7a200tffg1156-3 in Xilinx Vivado 18.3 using VHDL hardware description language. The goal of the simulation was to compare the technical characteristics of the FIR DF implemented using known architectures in PNS [30] and in RNS [11,14] with the FIR DF using the proposed architecture in RNS with different moduli sets. Table V shows results of hardware simulation of 15th order FIR DF with different bit width.…”
Section: Hardware Simulation Of Digital Filters In the Residue Numentioning
confidence: 99%
“…Comparison with the known method [11] based on RNS with 4 moduli showed that proposed method allows to increase the frequency of the 15th order FIR DF by 1.7-5.0 times and reduce hardware costs for its implementation by 1.5-4.8 times with increasing power consumption by 7% -30%. The proposed method with 5modulus RNS allows to increase the frequency of the 15th order FIR DF by 2.0-4.2 times and reduce the hardware costs for its implementation by 1.1-2.6 times, with increasing power consumption by 7% -33% compared to the known method [30] based on PNS. Comparison with the known method [11] based on RNS with 5 moduli showed that proposed method allows to increase the frequency of the 15th order FIR DF by 1.6-4.4 times and reduce hardware costs for its implementation by 1.8-2.5 times with increasing power consumption by 11% -41%.…”
Section: Hardware Simulation Of Digital Filters In the Residue Numentioning
confidence: 99%
“…In recent years, different researchers have done several works [2][3][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21]. Reference [22] proposes a high throughput MAC architecture that promises the optimized area in 2007.…”
Section: Introduction To Multiply and Accumulate (Mac) Architecturementioning
confidence: 99%