Improving DNN Fault Tolerance using Weight Pruning and Differential Crossbar Mapping for ReRAM-based Edge AI

Yuan, Geng; Liao, Zhiheng; Ma, Xiaolong; Cai, Yuxuan; Kong, Zhenglun; Shen, Xuan; Fu, Jingyan; Li, Zhengang; Zhang, Chengming; Peng, Hongwu; Liu, Ning; Ren, Ao; Wang, Jinhui; Wang, Yanzhi

doi:10.1109/isqed51717.2021.9424332

Cited by 31 publications

(17 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…2) redundant mapping technique [20], and 3) the proposed design methodology. Design areas are reported for both the 𝜇Brain-based core [18] and the crossbar-based core [5].…”

Section: Model Areamentioning

confidence: 99%

See 1 more Smart Citation

A design methodology for fault-tolerant computing using astrocyte neural networks

Isik

Paul

Varshika

et al. 2022

Proceedings of the 19th ACM International Conference on Computing Frontiers

View full text Add to dashboard Cite

We propose a design methodology to facilitate fault tolerance of deep learning models. First, we implement a many-core fault-tolerant neuromorphic hardware design, where neuron and synapse circuitries in each neuromorphic core are enclosed with astrocyte circuitries, the star-shaped glial cells of the brain that facilitate selfrepair by restoring the spike firing frequency of a failed neuron using a closed-loop retrograde feedback signal. Next, we introduce astrocytes in a deep learning model to achieve the required degree of tolerance to hardware faults. Finally, we use a system software to partition the astrocyte-enabled model into clusters and implement them on the proposed fault-tolerant neuromorphic design. We evaluate this design methodology using seven deep learning inference models and show that it is both area-and power-efficient. CCS CONCEPTS• Hardware → Neural systems; • Computer systems organization → Dependable and fault-tolerant systems and networks.

show abstract

“…2) redundant mapping technique [20], and 3) the proposed design methodology. Design areas are reported for both the 𝜇Brain-based core [18] and the crossbar-based core [5].…”

Section: Model Areamentioning

confidence: 99%

“…Recent efforts to this end include software solutions such as model replication [9] and error prediction coding [7], and hardware solutions such as approximation [12] and redundant mapping [20]. For FPGA-based neuromorphic designs, fault tolerance can also be addressed using periodic scrubbing [11,19].…”

Section: Introductionmentioning

confidence: 99%

A design methodology for fault-tolerant computing using astrocyte neural networks

Isik

Paul

Varshika

et al. 2022

Proceedings of the 19th ACM International Conference on Computing Frontiers

View full text Add to dashboard Cite

show abstract

“…Counting all 1s for all inputs (that feed to the crossbar in parallel) and performing subtractions for each of 1's introduces significant overhead to ISAAC. Moreover, this mapping method decreases the network robustness to hardware failures [29]. We can see that both methods will cost extra resources in terms of area, power, and energy consumption.…”

Section: B Challengesmentioning

confidence: 99%

“…Such a trade-off depends on the demands of real applications. It is worth noting that the prior techniques used to improve robustness [29,84,85] can be applied to FORMS.…”

Section: E Variation Analysismentioning

confidence: 99%

“…In contrast, ISAAC [18] adds an offset to weights so that all values become positive. While keeping the number of crossbars the same, the latter approach introduces additional hardware costs for the peripheral circuits by adding extra offset circuits and may also decrease the network robustness to hardware failures [29]. We argue that both solutions are not ideal, and we attempt to develop an alternative approach with better cost/performance Different from the previous approaches, which use additional hardware to "fix" the problem, our design principle is to enforce exactly what is assumed in the in-situ computationensuring the pattern that all weights in the same column of a crossbar have the same sign.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

FORMS: Fine-grained Polarized ReRAM-based In-situ Computation for Mixed-signal DNN Accelerator

Yuan

Behnam

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Recent work demonstrated the promise of using resistive random access memory (ReRAM) as an emerging technology to perform inherently parallel analog domain in-situ matrix-vector multiplication-the intensive and key computation in deep neural networks (DNNs). One key problem is the weights that are signed values. However, in a ReRAM crossbar, weights are stored as conductance of the crossbar cells, and the in-situ computation assumes all cells on each crossbar column are of the same sign. The current architectures either use two ReRAM crossbars for positive and negative weights (PRIME), or add an offset to weights so that all values become positive (ISAAC). Neither solution is ideal: they either double the cost of crossbars, or incur extra offset circuity. To better address this problem, we propose FORMS, a fine-grained ReRAM-based DNN accelerator with algorithm/hardware co-design. Instead of trying to represent the positive/negative weights, our key design principle is to enforce exactly what is assumed in the in-situ computationensuring that all weights in the same column of a crossbar have the same sign. It naturally avoids the cost of an additional crossbar. Such polarized weights can be nicely generated using alternating direction method of multipliers (ADMM) regularized optimization during the DNN training, which can exactly enforce certain patterns in DNN weights. To achieve high accuracy, we divide the crossbar into logical sub-arrays and only enforce this property within the fine-grained sub-array columns. Crucially, the small sub-arrays provides a unique opportunity for input zeroskipping, which can significantly avoid unnecessary computations and reduce computation time. At the same time, it also makes the hardware much easier to implement and is less susceptible to nonidealities and noise than coarse-grained architectures. Putting all together, with the same optimized DNN models, FORMS achieves 1.50× and 1.93× throughput improvement in terms of GOP s s×mm 2 and GOP s W compared to ISAAC, and 1.12× ∼ 2.4× speed up in terms of frame per second over optimized ISAAC with almost the same power/area cost. Interestingly, FORMS optimization framework can even speed up the original ISAAC from 10.7× up to 377.9×, reflecting the importance of software/hardware co-design optimizations.

show abstract

Communication Optimization in Heterogeneous Edge Networks Using Dynamic Grouping and Gradient Coding

Mao

et al. 2022

Wireless Algorithms, Systems, and Applications

View full text Add to dashboard Cite

Improving DNN Fault Tolerance using Weight Pruning and Differential Crossbar Mapping for ReRAM-based Edge AI

Cited by 31 publications

References 26 publications

A design methodology for fault-tolerant computing using astrocyte neural networks

A design methodology for fault-tolerant computing using astrocyte neural networks

FORMS: Fine-grained Polarized ReRAM-based In-situ Computation for Mixed-signal DNN Accelerator

Communication Optimization in Heterogeneous Edge Networks Using Dynamic Grouping and Gradient Coding

Contact Info

Product

Resources

About