Stochastic Data-driven Hardware Resilience to Efficiently Train Inference Models for Stochastic Hardware Implementations

Zhang, Bonan; Chen, Lung-Yen; Verma, Naveen

doi:10.1109/icassp.2019.8683521

Cited by 21 publications

(6 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In NI-based training, Gaussian noise N(0, σ 2 NI ) is added to network weights during the feed-forward pass of the back-propagation iterations during training. NI-based training has been observed to improve robustness to line resistance [8], and to device mismatch and conductance variation in [13,19]. We studied the impact of noise injection, by training networks with different values of σ NI , ranging from 0% to 10%.…”

Section: Robustness Improvements In Dnn Implementationsmentioning

confidence: 99%

“…For example, the achievable inference accuracy of crossbar architectures is limited by IR drop, device non-linearity, thermal noise, process variations, stuck-atfaults, write noise, and limited device endurance. A number of works have addressed these challenges such as [8] (IR drop), [19] (device variations), and [13] (conductance variations). Recently, onchip training methods have been proposed [16] to minimize the impact of die-specific variations.…”

Section: Introductionmentioning

confidence: 99%

“…However, these methods are expensive and can only be employed if the memory writes are infrequent and they do not address C2C variations which occur in every write cycle. Alternatively, noise injection-based one-time offline training methods [8,13,19] determine an averaged set of network parameters for an ensemble of dies thereby avoiding the cost of on-chip learning. However, such methods incur a significant loss in inference accuracy as compared to on-chip training methods.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Swipe

Gonugondla

Patil

Shanbhag

2020

Proceedings of the 39th International Conference on Computer-Aided Design

View full text Add to dashboard Cite

Section: Robustness Improvements In Dnn Implementationsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Swipe

Gonugondla

Patil

Shanbhag

2020

Proceedings of the 39th International Conference on Computer-Aided Design

View full text Add to dashboard Cite

“…The crossbar is assumed to use for inference in this paper. However, the non-ideal behavior of the device in the inference stage may significantly decrease the application level accuracy [38], which prevent the use of the emerging devices crossbar. In this work, we propose to use a modified training method to alleviate the impact of non-ideal behavior of the device and circuit, as shown in the right component of Figure 3 ➂.…”

Section: Cross-layer Exploration Frameworkmentioning

confidence: 99%

Device-Circuit-Architecture Co-Exploration for Computing-in-Memory Neural Accelerators

Jiang

Lou

Yan

et al. 2019

Preprint

View full text Add to dashboard Cite

Co-exploration of neural architectures and hardware design is promising to simultaneously optimize network accuracy and hardware efficiency. However, state-of-the-art neural architecture search algorithms for the co-exploration are dedicated for the conventional von-neumann computing architecture, whose performance is heavily limited by the well-known memory wall. In this paper, we are the first to bring the computing-inmemory architecture, which can easily transcend the memory wall, to interplay with the neural architecture search, aiming to find the most efficient neural architectures with high network accuracy and maximized hardware efficiency. Such a novel combination makes opportunities to boost performance, but also brings a bunch of challenges. The design space spans across multiple layers from device type, circuit topology to neural architecture. In addition, the performance may degrade in the presence of device variation. To address these challenges, we propose a crosslayer exploration framework, namely NACIM, which jointly explores device, circuit and architecture design space and takes device variation into consideration to find the most robust neural architectures. Experimental results demonstrate that NACIM can find the robust neural network with 0.45% accuracy loss in the presence of device variation, compared with a 76.44% loss from the state-of-the-art NAS without consideration of variation; in addition, NACIM achieves an energy efficiency up to 16.3 TOPs/W, 3.17× higher than the state-of-the-art NAS.

show abstract

“…machine learning, to enable reduced SNR by incorporating models of the computational noise, in both chip-specific [16]- [18] and chip-generalized [19] training algorithms. This has shown promise, warranting the further research needed to transition such models toward generalized abstractions, suitable for application design and mapping across the range of hardware design parameters and operating conditions.…”

mentioning

confidence: 99%

A Programmable Heterogeneous Microprocessor Based on Bit-Scalable In-Memory Computing

Jia

Valavi

Tang

et al. 2020

IEEE J. Solid-State Circuits

Self Cite

142

View full text Add to dashboard Cite

In-memory computing (IMC) addresses the cost of accessing data from memory in a manner that introduces a tradeoff between energy/throughput and computation signal-tonoise ratio (SNR). However, low SNR posed a primary restriction to integrating IMC in larger, heterogeneous architectures required for practical workloads due to the challenges with creating robust abstractions necessary for the hardware and software stack. This work exploits recent progress in high-SNR IMC to achieve a programmable heterogeneous microprocessor architecture implemented in 65-nm CMOS and corresponding interfaces to the software that enables mapping of application workloads. The architecture consists of a 590-Kb IMC accelerator, configurable digital near-memory-computing (NMC) accelerator, RISC-V CPU, and other peripherals. To enable programmability, microarchitectural design of the IMC accelerator provides the integration in the standard processor memory space, areaand energy-efficient analog-to-digital conversion for interfacing to NMC, bit-scalable computation (1-8 b), and input-vector sparsity-proportional energy consumption. The IMC accelerator demonstrates excellent matching between computed outputs and idealized software-modeled outputs, at 1b TOPS/W of 192|400 and 1b-TOPS/mm 2 of 0.60|0.24 for MAC hardware, at V D D of 1.2|0.85 V, both of which scale directly with the bit precision of the input vector and matrix elements. Software libraries developed for application mapping are used to demonstrate CIFAR-10 image classification with a ten-layer CNN, achieving accuracy, throughput, and energy of 89.3%|92.4%, 176|23 images/s, and 5.31|105.2 µJ/image, for 1|4 b quantization levels. Index Terms-Charge-domain compute, deep learning, hardware accelerators, in-memory computing (IMC), neural networks (NNs). I. INTRODUCTION M ACHINE-LEARNING inference, particularly based on neural networks (NNs), has provided unprecedented capabilities in various cognitive tasks, such as vision and language processing. [1]-[5]. However, pervasive deployment, especially in edge applications, has been limited by the high computational requirements that are dominated by highdimensionality matrix-vector multiplications (MVMs). To address this, many optimizations, focusing on both hardware specialization and model design (e.g., sparsity, compression,

show abstract

Stochastic Data-driven Hardware Resilience to Efficiently Train Inference Models for Stochastic Hardware Implementations

Cited by 21 publications

References 12 publications

Swipe

Swipe

Device-Circuit-Architecture Co-Exploration for Computing-in-Memory Neural Accelerators

A Programmable Heterogeneous Microprocessor Based on Bit-Scalable In-Memory Computing

Contact Info

Product

Resources

About