Safe Overclocking for CNN Accelerators Through Algorithm-Level Error Detection

Marty, Thibaut; Yuki, Tomofumi; Derrien, Steven

doi:10.1109/tcad.2020.2981056

Cited by 20 publications

(9 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The difference is added back to the final result at the next cycle without stalling the pipeline. Lastly, the forefront research of [123] proposes a technique to improve the efficiency of DNN accelerators with spatial architecture based on overclocking (timing speculation) and inherent error resilience. The authors presented an algorithmic-based lightweight TE detection mechanism to protect convolution layers, enabling aggressive timing speculation.…”

Section: Timing Error Detectionmentioning

confidence: 99%

Dependable DNN Accelerator for Safety-Critical Systems: A Review on the Aging Perspective

Moghaddasi,

Gorgin,

Lee

2023

IEEE Access

View full text Add to dashboard Cite

Nowadays, artificial intelligence (AI) and deep learning (DL) progressively adapt to various spheres of our lives. These disciplines contain safety-critical applications such as autonomous driving with a high risk of human injury in the case of malfunction, requiring a high promise of dependability. Even the dependability becomes more crucial as shrinking CMOS technology feature size worsens the resilience concerns due to factors like aging. This paper addresses the overarching dependability issue of advanced deep neural networks (DNN) accelerators from the aging perspective. Especially, a comprehensive survey and taxonomy of techniques used to evaluate and mitigate aging effects are introduced. We cover different aging effects like permanent faults, timing errors, and lifetime issues. We review research by the layer-wise approach and categorize several resilience classes to bring out major features. The concluding part of this review highlights the questions answered and several future research directions. This study is expected to benefit researchers in different areas of DNN deployment, especially the dependability of this emergent paradigm.

show abstract

Section: Timing Error Detectionmentioning

confidence: 99%

Dependable DNN Accelerator for Safety-Critical Systems: A Review on the Aging Perspective

Moghaddasi,

Gorgin,

Lee

2023

IEEE Access

View full text Add to dashboard Cite

show abstract

“…While the overhead of the naive ABFT is non-trivial, Dionysios Filippas et al [62] proposed a lightweight ABFT implementation, ConvGuard, which predicts the output checksum of convolution implicitly by accumulating only the pixels at the border of the dropped input features. Thibaut Marty et al [63] proposed to utilize the ABFT technique to mitigate timing errors induced by overclocking of the neural network accelerators on FPGAs. Their experiments reveal that the proposed ABFT design poses negligible area overhead, enables aggressive overclocking of the neural network accelerators, and achieves up to 60% throughput improvement of the overall neural network processing.…”

Section: A Related Workmentioning

confidence: 99%

“…Sung Kim et al [83] proposed to combine adaptive neural network training and weight memory voltage scaling to achieve energy-efficient neural network processing. Similar cross-layer optimizations that utilize voltage scaling and faultaware training or high-level fault correction are also applied in many different scenarios [14] [84] [85] [63]. In summary, cross-layer fault-tolerant approaches show promising results in generally and it can be expected many of the fault-tolerant techniques surveyed in prior sections can also be potentially combined and optimized for more effective protection against hardware faults.…”

Section: Cross-layer Fault Tolerancementioning

confidence: 99%

Fault-Tolerant Deep Learning: A Hierarchical Perspective

Liu¹,

Gao²,

Liu³

et al. 2022

Preprint

View full text Add to dashboard Cite

With the rapid advancements of deep learning in the past decade, it can be foreseen that deep learning will be continuously deployed in more and more safety-critical applications such as autonomous driving and robotics. In this context, reliability turns out to be critical to the deployment of deep learning in these applications and gradually becomes a firstclass citizen among the major design metrics like performance and energy efficiency. Nevertheless, the back-box deep learning models combined with the diverse underlying hardware faults make resilient deep learning extremely challenging. In this special session, we conduct a comprehensive survey of fault-tolerant deep learning design approaches with a hierarchical perspective and investigate these approaches from model layer, architecture layer, circuit layer, and cross layer respectively.

show abstract

“…As analysed in [17], ABFT provides for high confidence, i.e., close to 100%, in detecting errors in neural computations. While it is not protecting the look-up-table based non-linear operations in the activation layers of the neural net, the delaypaths of multiplication and addition circuits are far longer and more likely to suffer from lower voltage induced phenomena [28]. In Fig.…”

Section: B Error Detection Through Abftmentioning

confidence: 99%

Low-Voltage Energy Efficient Neural Inference by Leveraging Fault Detection Techniques

Safarpour¹,

Deng²,

Massingham³

et al. 2021

Preprint

View full text Add to dashboard Cite

Operating at reduced voltages offers substantial energy efficiency improvement but at the expense of increasing the probability of computational errors due to hardware faults. In this context, we targeted Deep Neural Networks (DNN) as emerging energy hungry building blocks in embedded applications. Without an error feedback mechanism, blind voltage down-scaling will result in degraded accuracy or total system failure. To enable safe voltage down-scaling, in this paper two solutions based on Self-Supervised Learning (SSL) and Algorithm Based Fault Tolerance (ABFT) were developed. A DNN model trained on MNIST data-set was deployed on a Field Programmable Gate Array (FPGA) that operated at reduced voltages and employed the proposed schemes. The SSL approach provides extremely low-overhead (≈0.2%) fault detection at the cost of lower error coverage and extra training, while ABFT incurs less than 8%overheads at run-time with close to 100% error detection rate. By using the solutions, substantial energy savings, i.e., up to 40.3%,without compromising the accuracy of the model was achieved

show abstract

Safe Overclocking for CNN Accelerators Through Algorithm-Level Error Detection

Cited by 20 publications

References 30 publications

Dependable DNN Accelerator for Safety-Critical Systems: A Review on the Aging Perspective

Dependable DNN Accelerator for Safety-Critical Systems: A Review on the Aging Perspective

Fault-Tolerant Deep Learning: A Hierarchical Perspective

Low-Voltage Energy Efficient Neural Inference by Leveraging Fault Detection Techniques

Contact Info

Product

Resources

About