2022 IEEE 28th International Symposium on on-Line Testing and Robust System Design (IOLTS) 2022
DOI: 10.1109/iolts56730.2022.9897823
|View full text |Cite
|
Sign up to set email alerts
|

Effective fault simulation of GPU’s permanent faults for reliability estimation of CNNs

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
7
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(7 citation statements)
references
References 16 publications
0
7
0
Order By: Relevance
“…Similarly, the evaluation accuracy directly depends on the definition of the error functions from hardware faults. Moreover, the analyses might be limited to data-path structures in a GPU (e.g., register files, memories, and functional units), and more recently some controllers [27].…”
Section: A Methods For Reliability Assessment Of Ai Acceleratorsmentioning
confidence: 99%
“…Similarly, the evaluation accuracy directly depends on the definition of the error functions from hardware faults. Moreover, the analyses might be limited to data-path structures in a GPU (e.g., register files, memories, and functional units), and more recently some controllers [27].…”
Section: A Methods For Reliability Assessment Of Ai Acceleratorsmentioning
confidence: 99%
“…Unfortunately, the accuracy of the method directly depends on the targeted units (mostly data path units) and the available error models to represent faults. In literature, some works [32] addressed the software-based characterization of large workloads (CNNs) in GPUs when errors affected functional units under software error models limited to random bit-flips. Authors in [33] analyzed the impact of errors in CNNs and developed error models to represent corruptions on applications, but neglecting the fine-grain micro-architecture of the underlying hardware.…”
Section: A Motivation and Related Workmentioning
confidence: 99%
“…Authors in [33] analyzed the impact of errors in CNNs and developed error models to represent corruptions on applications, but neglecting the fine-grain micro-architecture of the underlying hardware. In [34], the authors explored a hybrid strategy to represent software errors from faults in GPU controllers. Unfortunately, their analyses were limited to a few structures with considerable evaluation times.…”
Section: A Motivation and Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Due to the ever-increasing trend towards having ML models embedded in edge safety-critical systems, researchers have started to investigate the impact of radiation-induced soft errors on the reliability of underlying models considering simulation (e.g., [22][23] [24][25] [26][27] [28][29]), emulation (e.g., [6][14] [30]) and radiation tests (see Table I). While high-level simulation and emulation approaches are highly useful to conduct early soft error assessment and eliminate not suitable options, final systems' configurations must be evaluated through radiation tests.…”
Section: Related Work In Machine Learning Soft Error Assessment and M...mentioning
confidence: 99%