Abinash Mohanty scite author profile

Resistive random access memory (RRAM) is a promising technology for energy-efficient neuromorphic accelerators. However, when a pretrained deep neural network (DNN) model is programmed to an RRAM array for inference, the model suffers from accuracy degradation due to RRAM nonidealities, such as device variations, quantization error, and stuck-at-faults. Previous solutions involving multiple readverify-write (R-V-W) to the RRAM cells require cell-by-cell compensation and, thus, an excessive amount of processing time. In this article, we propose a joint algorithm-design solution to mitigate the accuracy degradation. We first leverage knowledge distillation (KD), where the model is trained with the RRAM nonidealities to increase the robustness of the model under device variations. Furthermore, we propose random sparse adaptation (RSA), which integrates a small on-chip memory with the main RRAM array for postmapping adaptation. Only the on-chip memory is updated to recover the inference accuracy. The joint algorithm-design solution achieves the state-of-the-art accuracy of 99.41% for MNIST (LeNet-5) and 91.86% for CIFAR-10 (VGG-16) with up to 5% parameters as overhead while providing a 15-150× speedup compared with R-V-W. INDEX TERMS Convolution neural networks, device nonidealities, model robustness, neuromorphic computing, random sparse adaptation (RSA), resistive random access memory (RRAM).

show abstract

Parallel Architecture With Resistive Crosspoint Array for Dictionary Learning Acceleration

Kadetotad

Mohanty

et al. 2015

IEEE J. Emerg. Sel. Topics Circuits Syst.

View full text Add to dashboard Cite

This paper proposes a parallel architecture with resistive crosspoint array. The design of its two essential operations, read and write, is inspired by the biophysical behavior of a neural system, such as integrate-and-fire and local synapse weight update. The proposed hardware consists of an array with resistive random access memory (RRAM) and CMOS peripheral circuits, which perform matrix-vector multiplication and dictionary update in a fully parallel fashion, at the speed that is independent of the matrix dimension. The read and write circuits are implemented in 65 nm CMOS technology and verified together with an array of RRAM device model built from experimental data. The overall system exploits array-level parallelism and is demonstrated for accelerated dictionary learning tasks. As compared to software implementation running on a 8-core CPU, the proposed hardware achieves more than 3000 speedup, enabling high-speed feature extraction on a single chip.Index Terms-CMOS integration, dictionary learning, memristive device, parallel computing, resistive crosspoint array.

show abstract

On-Chip Sparse Learning Acceleration With CMOS and Resistive Synaptic Devices

Seo

Lin

Kim

et al. 2015

IEEE Trans. Nanotechnology

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Abinash Mohanty

Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks

Technology-design Co-optimization of Resistive Cross-point Array for Accelerating Learning Algorithms on Chip

Accurate Inference With Inaccurate RRAM Devices: A Joint Algorithm-Design Solution

Parallel Architecture With Resistive Crosspoint Array for Dictionary Learning Acceleration

On-Chip Sparse Learning Acceleration With CMOS and Resistive Synaptic Devices

Contact Info

Product

Resources

About