An Analytical Method to Determine Minimum Per-Layer Precision of Deep Neural Networks

Sakr, Charbel; Shanbhag, Naresh R.

doi:10.1109/icassp.2018.8461702

Cited by 40 publications

(53 citation statements)

References 3 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Among these techniques, reduced-precision computing has demonstrated large benefits with low or negligible impact on the network accuracy [9], [10]. It has been shown that the optimal precision not only varies from one neural network to another, it also can vary within a neural network itself [11], [12], from layer to layer or even from channel to channel. This has lead to a new trend of ultra-efficient run-time precisionscalable neural processors for embedded Deep Neural Network (DNN) processing in mobile devices and IoT nodes.…”

mentioning

confidence: 99%

Review and Benchmarking of Precision-Scalable Multiply-Accumulate Unit Architectures for Embedded Neural-Network Processing

Camus

Mei

Enz

et al. 2019

IEEE J. Emerg. Sel. Topics Circuits Syst.

View full text Add to dashboard Cite

The current trend for deep learning has come with an enormous computational need for billions of Multiply-Accumulate (MAC) operations per inference. Fortunately, reduced precision has demonstrated large benefits with low impact on accuracy, paving the way towards processing in mobile devices and IoT nodes. To this end, various precision-scalable MAC architectures optimized for neural networks have recently been proposed. Yet, it has been hard to comprehend their differences and make a fair judgment of their relative benefits as they have been implemented with different technologies and performance targets. To overcome this, this work exhaustively reviews the state-of-the-art precision-scalable MAC architectures and unifies them in a new taxonomy. Subsequently, these different topologies are thoroughly benchmarked in a 28 nm commercial CMOS process, across a wide range of performance targets, and with precision ranging from 2 to 8 bits. Circuits are analyzed for each precision as well as jointly in practical use cases, highlighting the impact of architectures and scalability in terms of energy, throughput, area and bandwidth, aiming to understand the key trends to reduce computation costs in neural-network processing.Index Terms-ASIC, deep neural networks, precision-scalable circuits, configurable circuits, MAC, multiply-accumulate units. I. INTRODUCTIONE MBEDDED deep learning has gained a lot of attention nowadays due to its broad application prospects and vast potential market. However, the main challenge to embrace this era of edge intelligence comes from the supply-anddemand gap between the limited energy budget of embedded devices, often battery powered, and the computationallyintensive deep-learning algorithms, requiring billions of Multiply-Accumulate (MAC) operations and data movements.To alleviate this unbalanced relationship, many approaches have been investigated at different levels of abstraction. At algorithmic level, researchers have introduced hardware-

show abstract

mentioning

confidence: 99%

Review and Benchmarking of Precision-Scalable Multiply-Accumulate Unit Architectures for Embedded Neural-Network Processing

Camus

Mei

Enz

et al. 2019

IEEE J. Emerg. Sel. Topics Circuits Syst.

View full text Add to dashboard Cite

show abstract

“…Prior work [25,26], indicates the requirement SNR T(dB) > SNR * T(dB) = 10 dB-40 dB (see Fig. 2) for the inference accuracy of an FX network to be within 1% of the corresponding FL network for popular DNNs (AlexNet, VGG-9, VGG-16, ResNet-18) deployed on the ImageNet and CIFAR-10 datasets.…”

Section: Precision Assignment Methodology For Imcsmentioning

confidence: 99%

“…The BGC is commonly employed to assign the output precision in digital architectures [9,25]. BGC sets as:…”

Section: Bit Growth Criterion (Bgc)mentioning

confidence: 99%

Fundamental limits on the precision of in-memory architectures

Gonugondla

Sakr

Dbouk

et al. 2020

Proceedings of the 39th International Conference on Computer-Aided Design

Self Cite

View full text Add to dashboard Cite

“…As a counterexample, Li et al provided derivations for quantised DNNs' convergence criteria [79]. Sakr et al also investigated analytical guarantees on the numerical precision of DNNs with xed-point quantisation [119].…”

Section: Research Objectivesmentioning

confidence: 99%

Deep Neural Network Approximation for Custom Hardware

et al. 2019

View full text Add to dashboard Cite

LondonDeep neural networks have proven to be particularly e ective in visual and audio recognition tasks. Existing models tend to be computationally expensive and memory intensive, however, and so methods for hardwareoriented approximation have become a hot topic. Research has shown that custom hardware-based neural network accelerators can surpass their general-purpose processor equivalents in terms of both throughput and energy e ciency. Application-tailored accelerators, when co-designed with approximation-based network training methods, transform large, dense and computationally expensive networks into small, sparse and hardware-e cient alternatives, increasing the feasibility of network deployment. In this article, we provide a comprehensive evaluation of approximation methods for high-performance network inference along with in-depth discussion of their e ectiveness for custom hardware implementation. We also include proposals for future research based on a thorough analysis of current trends. is article represents the rst survey providing detailed comparisons of custom hardware accelerators featuring approximation for both convolutional and recurrent neural networks, through which we hope to inspire exciting new developments in the eld.

show abstract

An Analytical Method to Determine Minimum Per-Layer Precision of Deep Neural Networks

Cited by 40 publications

References 3 publications

Review and Benchmarking of Precision-Scalable Multiply-Accumulate Unit Architectures for Embedded Neural-Network Processing

Review and Benchmarking of Precision-Scalable Multiply-Accumulate Unit Architectures for Embedded Neural-Network Processing

Fundamental limits on the precision of in-memory architectures

Deep Neural Network Approximation for Custom Hardware

Contact Info

Product

Resources

About