2020 IEEE 26th International Symposium on on-Line Testing and Robust System Design (IOLTS) 2020
DOI: 10.1109/iolts50870.2020.9159704
|View full text |Cite
|
Sign up to set email alerts
|

High-level Modeling of Manufacturing Faults in Deep Neural Network Accelerators

Abstract: The advent of data-driven real-time applications requires the implementation of Deep Neural Networks (DNNs) on Machine Learning accelerators. Google's Tensor Processing Unit (TPU) is one such neural network accelerator that uses systolic array-based matrix multiplication hardware for computation in its crux. Manufacturing faults at any state element of the matrix multiplication unit can cause unexpected errors in these inference networks. In this paper, we propose a formal model of permanent faults and their p… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
9
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
3

Relationship

2
5

Authors

Journals

citations
Cited by 11 publications
(9 citation statements)
references
References 10 publications
0
9
0
Order By: Relevance
“…Fault Resilience is an important characteristic of DNNs that can be defined as a function of accuracy loss. The DNN accelerators inherit such resilience for a considerable range of fault-bits to ensure reliable computation [10]. However, it may become insignificant, when faults incur performance degradation by influencing the MSBs, especially, such that they remain unmasked in the resulting outputs.…”
Section: A Fault Resiliencementioning
confidence: 99%
See 2 more Smart Citations
“…Fault Resilience is an important characteristic of DNNs that can be defined as a function of accuracy loss. The DNN accelerators inherit such resilience for a considerable range of fault-bits to ensure reliable computation [10]. However, it may become insignificant, when faults incur performance degradation by influencing the MSBs, especially, such that they remain unmasked in the resulting outputs.…”
Section: A Fault Resiliencementioning
confidence: 99%
“…However, the impact of a fault may vary according to the activation functions, fault type and bit position, and approximation error resilience of each layer. Since, the permanent faults affect the performance of accurate DNNs more significantly than occasional transient faults [10]. Their impact may be more prominent in AxDNNs due to their inexact nature.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…The safety critical bits in a machine learning system were identified by inducing soft errors in the network datapath and evaluating them on eight DNN models across six datasets [21]. A formal analysis on the impact of faults in the datapath of an accelerator has been illustrated using Discrete-Time Markov Chain (DTMC) formalism [22]. The intense performance penalty in a systolic array-based DNN accelerator has been demonstrated by inducing manufacturing defects in the datapath of the accelerator [4], [23], [24].…”
Section: B Dnn Accelerator and Its Reliabilitymentioning
confidence: 99%
“…However, the most recent works' main concentration is limited to AccDNNs. Recently, Kundu et al elucidated the impact of masking and non-masking of permanent faults in small-scaled Google-TPU-like systolic array-based DNN accelerators with their formal guarantees [20]. In another work, Santos et al injected the permanent faults in the register files of GPU to demonstrate their effect on reduced precision AccDNNs [21].…”
mentioning
confidence: 99%