2017
DOI: 10.1109/lca.2017.2656880
|View full text |Cite
|
Sign up to set email alerts
|

Low Complexity Multiply Accumulate Unit for Weight-Sharing Convolutional Neural Networks

Abstract: Convolutional Neural Networks (CNNs) are one of the most successful deep machine learning technologies for processing image, voice and video data. CNNs require large amounts of processing capacity and memory, which can exceed the resources of low power mobile and embedded systems. Several designs for hardware accelerators have been proposed for CNNs which typically contain large numbers of Multiply Accumulate (MAC) units. One approach to reducing data sizes and memory traffic in CNN accelerators is "weight sha… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
9
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
6
2
1

Relationship

2
7

Authors

Journals

citations
Cited by 28 publications
(9 citation statements)
references
References 8 publications
0
9
0
Order By: Relevance
“…Garland et al, [11] show that they can vary the bit-width of their parallel accumulate shared MAC (PASM) between 4-bit and 32-bit in ASIC and maintain performance and accuracy while reducing power and area of the multiplier. In their follow up work, Garland et al, [12] show PASM can be implemented on an FPGA and vary the bit-width between INT8 and 32-bit, saving significant energy with only a slight increase in latency and no change in classification accuracy.…”
Section: Related Workmentioning
confidence: 99%
“…Garland et al, [11] show that they can vary the bit-width of their parallel accumulate shared MAC (PASM) between 4-bit and 32-bit in ASIC and maintain performance and accuracy while reducing power and area of the multiplier. In their follow up work, Garland et al, [12] show PASM can be implemented on an FPGA and vary the bit-width between INT8 and 32-bit, saving significant energy with only a slight increase in latency and no change in classification accuracy.…”
Section: Related Workmentioning
confidence: 99%
“…The 16-PAS-4-MAC also consumes 61% less leakage power, 70% less dynamic power and 70% less total power ( Figure 10). More details can be found in our original paper, [Garland and Gregg 2017].…”
Section: Evaluation Of Pasm As a Stand-alone Unitmentioning
confidence: 99%
“…1c are not used during 16 × 16 bit MAC mode. In most of the non-vector MAC designs [6][7][8][18][19][20][21], the flexibility to perform multiple MAC operations is absent. For example, second, third, and fourth quarters of Figs.…”
Section: Related Workmentioning
confidence: 99%
“…In [20], the previous MAC result is added along with the sum and carry from the last carry save stage of the Wallace tree multiplier. In [21], memory‐based conventional MAC is designed, where one of the operand will be sent to the multiplier from the memory. In most of the above‐mentioned existing n×n bits vector MACs ([14–16]), the hardware utilisation is less during false(n/2false)×false(n/2false) bits or n×false(n/2false) bits mode of operations.…”
Section: Introductionmentioning
confidence: 99%