2020
DOI: 10.1109/jxcdc.2020.2987605
|View full text |Cite
|
Sign up to set email alerts
|

Accurate Inference With Inaccurate RRAM Devices: A Joint Algorithm-Design Solution

Abstract: Resistive random access memory (RRAM) is a promising technology for energy-efficient neuromorphic accelerators. However, when a pretrained deep neural network (DNN) model is programmed to an RRAM array for inference, the model suffers from accuracy degradation due to RRAM nonidealities, such as device variations, quantization error, and stuck-at-faults. Previous solutions involving multiple readverify-write (R-V-W) to the RRAM cells require cell-by-cell compensation and, thus, an excessive amount of processing… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

2
48
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 32 publications
(50 citation statements)
references
References 29 publications
2
48
0
Order By: Relevance
“…This would be a fundamentally necessary feature for synapse weight to learn or classify the characteristics of image patterns with analog levels. These synaptic properties related to quantization are generally consistent with those reported in the literature although it may be slightly affected by the neural network structure, the number of parameters and the complexity of the image dataset [32][33]. The weight values from software training can be converted into conductivity of synaptic devices.…”
Section: Resultssupporting
confidence: 81%
“…This would be a fundamentally necessary feature for synapse weight to learn or classify the characteristics of image patterns with analog levels. These synaptic properties related to quantization are generally consistent with those reported in the literature although it may be slightly affected by the neural network structure, the number of parameters and the complexity of the image dataset [32][33]. The weight values from software training can be converted into conductivity of synaptic devices.…”
Section: Resultssupporting
confidence: 81%
“…However, these large models require an order of millions of multiply-accumulate operations, which are fundamentally computational intensive operations [6]. These data-intensive operations and a large number of model parameters due to the model size mean that these models would require large memory and memory bandwidth in order to achieve reasonable performance [7]- [10]. As a result, these deep models cannot be deployed on resource-constrained edge computing devices with limited computing resources and power budget [7], [11] such as battery-powered mobile and internet of things (IoT) devices.…”
Section: Introductionmentioning
confidence: 99%
“…Hence, DNN models with L 1 /T opK BatchNorm has excellent noise-resistant property than DNN models with L 2 BatchNorm. Furthermore, the relation between the loss gradient of the weight of a model with BatchNorm and those without BatchNorm is shown in equation (10) [24] where L and L are the loss of the model with and without BatchNorm, respectively, σ j is the standard deviation of the BatchNorm, γ is the BatchNorm layer trainable parameter, where y j and ŷj are the output of the model with and without BatchNorm, respectively,∇ is the gradient, and m is the batch size.…”
Section: Introductionmentioning
confidence: 99%
“…Despite their unprecedented level of performance and improvement in their design in recent years, DL models require high computational and energy resources during training and inference [2], [3]. The high computational resource requirement is because of the intense fundamental operations by these models, such as dot product of vector and matrix, and multiplications of matrices, during training and inference [4], [5]. This is further complicated by the large increase in the quantity of these operations with the increase in the size of the models.…”
Section: Introductionmentioning
confidence: 99%
“…This tight requirement and the desire to fix compute and memory transfer bottlenecks in current set of hardware has led to a significant interest in analog specialized hardware for DL, as they have the potential to deliver at least 2X better performance than the conventional digital hardware in both speed and energy efficiency [8], [9]. In fact, they can deliver at projected throughput of multiple tera-operations (TOPs) per seconds and also achieve femtojoule energy budgets per multiply-and-accumulate (MAC) operation [5], [10]- [12]. The improvement can be attributed to the use of non-volatile memory cross bar arrays to encode DL model weights and biases, a form of computing known as in-memory computing.…”
Section: Introductionmentioning
confidence: 99%