James Garland scite author profile

James Garland

5Publications

47Citation Statements Received

101Citation Statements Given

How they've been cited

How they cite others

101

Affiliations

Institute of Technology Carlow, Trinity College Dublin

Publications

Order By: Most citations

Low Complexity Multiply-Accumulate Units for Convolutional Neural Networks with Weight-Sharing

Garland

Gregg

2018

ACM Trans. Archit. Code Optim.

View full text Add to dashboard Cite

Convolutional neural networks (CNNs) are one of the most successful machine learning techniques for image, voice and video processing. CNNs require large amounts of processing capacity and memory bandwidth. Hardware accelerators have been proposed for CNNs which typically contain large numbers of multiply-accumulate (MAC) units, the multipliers of which are large in integrated circuit (IC) gate count and power consumption. "Weight sharing" accelerators have been proposed where the full range of weight values in a trained CNN are compressed and put into bins and the bin index used to access the weight-shared value. We reduce power and area of the CNN by implementing parallel accumulate shared MAC (PASM) in a weight-shared CNN. PASM re-architects the MAC to instead count the frequency of each weight and place it in a bin. The accumulated value is computed in a subsequent multiply phase, significantly reducing gate count and power consumption of the CNN. In this paper, we implement PASM in a weight-shared CNN convolution hardware accelerator and analyze its effectiveness. Experiments show that for a clock speed 1GHz implemented on a 45nm ASIC process our approach results in fewer gates, smaller logic, and reduced power with only a slight increase in latency. We also show that the same weight-shared-with-PASM CNN accelerator can be implemented in resource-constrained FPGAs, where the FPGA has limited numbers of digital signal processor (DSP) units to accelerate the MAC operations.

show abstract

Low-precision Logarithmic Number Systems

Alam

Garland

Gregg

2021

ACM Trans. Archit. Code Optim.

View full text Add to dashboard Cite

Logarithmic number systems (LNS) are used to represent real numbers in many applications using a constant base raised to a fixed-point exponent making its distribution exponential. This greatly simplifies hardware multiply, divide, and square root. LNS with base-2 is most common, but in this article, we show that for low-precision LNS the choice of base has a significant impact. We make four main contributions. First, LNS is not closed under addition and subtraction, so the result is approximate. We show that choosing a suitable base can manipulate the distribution to reduce the average error. Second, we show that low-precision LNS addition and subtraction can be implemented efficiently in logic rather than commonly used ROM lookup tables, the complexity of which can be reduced by an appropriate choice of base. A similar effect is shown where the result of arithmetic has greater precision than the input. Third, where input data from external sources is not expected to be in LNS, we can reduce the conversion error by selecting a LNS base to match the expected distribution of the input. Thus, there is no one base that gives the global optimum, and base selection is a trade-off between different factors. Fourth, we show that circuits realized in LNS require lower area and power consumption for short word lengths.

show abstract

Low Complexity Multiply Accumulate Unit for Weight-Sharing Convolutional Neural Networks

Garland

Gregg

2017

IEEE Comput. Arch. Lett.

View full text Add to dashboard Cite

Convolutional Neural Networks (CNNs) are one of the most successful deep machine learning technologies for processing image, voice and video data. CNNs require large amounts of processing capacity and memory, which can exceed the resources of low power mobile and embedded systems. Several designs for hardware accelerators have been proposed for CNNs which typically contain large numbers of Multiply Accumulate (MAC) units. One approach to reducing data sizes and memory traffic in CNN accelerators is "weight sharing", where the full range of values in a trained CNN are put in bins and the bin index is stored instead of the original weight value. In this paper we propose a novel MAC circuit that exploits binning in weight-sharing CNNs. Rather than computing the MAC directly we instead count the frequency of each weight and place it in a bin. We then compute the accumulated value in a subsequent multiply phase. This allows hardware multipliers in the MAC circuit to be replaced with adders and selection logic. Experiments show that for the same clock speed our approach results in fewer gates, smaller logic, and reduced power. Index Terms-Convolutional neural network, power efficiency, multiply accumulate, arithmetic hardware circuits.!

show abstract

Hardware and software performance in deep learning

Anderson

Garland

Wen

et al. 2019

View full text Add to dashboard Cite

Fourth Sunday after Pentecost

Garland¹

1986

The Expository Times

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

James Garland

Low Complexity Multiply-Accumulate Units for Convolutional Neural Networks with Weight-Sharing

Low-precision Logarithmic Number Systems

Low Complexity Multiply Accumulate Unit for Weight-Sharing Convolutional Neural Networks

Hardware and software performance in deep learning

Fourth Sunday after Pentecost

Contact Info

Product

Resources

About