2022
DOI: 10.1109/tns.2022.3155820
|View full text |Cite
|
Sign up to set email alerts
|

Reliability Evaluation of LU Decomposition on GPU-Accelerated System-on-Chip Under Proton Irradiation

Abstract: Graphic Processing Units (GPU) have become a basic accelerator both in high-performance nodes and low-power SoC. They provide massive data parallelism and very high performance per watt. However, their reliability in harsh environments is an important issue to take into account, especially for safetycritical applications. In this paper we evaluate the influence of the parallelization strategy on reliability of LU decomposition on a GPU-accelerated SoC under proton irradiation. Specifically we compare a memory … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0
1

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 10 publications
(6 citation statements)
references
References 27 publications
0
5
0
1
Order By: Relevance
“…Parallel strategies reliability has been tested on lower-upper decomposition and a comparison of a memory bound and a compute bound implementation of the decomposition has been proposed. Results show that more intensive use of the resources of the GPU increases the cross section [139].…”
Section: Papermentioning
confidence: 98%
See 1 more Smart Citation
“…Parallel strategies reliability has been tested on lower-upper decomposition and a comparison of a memory bound and a compute bound implementation of the decomposition has been proposed. Results show that more intensive use of the resources of the GPU increases the cross section [139].…”
Section: Papermentioning
confidence: 98%
“…Device(s) Code(s) Radiation Main Considerations [123] GPU MxM neutrons first public data on GPUs reliability [7], [124] GPU various neutrons scheduler and parallelism management is vulnerable and critical [125], [126] GPU MxM, FFT neutrons multiple output elements can be corrupted by a single particle [127] GPU, Xeon Phi various neutrons the parallel architecture influences the code sensitivity and error criticality [128] GPU, ARM, FPGA various neutrons strong dependence between computing architecture and code sensitivity [11] GPU MxM, CNNs neutrons multiple corruptions cause misclassification on CNNs [129] tensor cores MxM neutrons tensor cores have higher error rate and different fault model [130] GPU, Xeon Phi, FPGA various neutrons low precision reduces the error rate but has a higher impact on the output [131] GPU MxM, Yolov3 neutrons most DUEs are generated in hidden hardware resources [132] GPU DDR various neutrons on-board DDR are prone to experience permanent faults [133], [134] FPGA MNIST neutrons high error rate, reduced with lower precision implementation [135] NeuroShield CNNs neutrons robust setup and simple fault model [13] Google TPU conv., CNNs neutrons characterization of atomic operations and CNN fault model [136] Versal SoC various neutrons neutrons and protons data, no permanent effect [137] Flashed-based FPGA LeNet neutrons low precision increase fault criticality [138], [139] GPU SoC MxM, LuD protons software implementation and parallelism impact the GPU error rate [140] AMD GPU various protons FIT rate and behavior under protons [141] Versal ACAP various protons neutrons and 64MeV protons SEL and SEU data on Programable Logic [142] Versal SoC various ions comparison of protons and ios, no SEL [143] GPU various ions overview of heavy ion test setup and data [144] AI accelerators various ions extensive comparison of the reliability of AI accelerators for in space [145], [146] Myriad VPU various ions no latchup, low error rate in DDR, potentially good for space mission [147]…”
Section: Papermentioning
confidence: 99%
“…针对Xilinx 28 nm Zynq-7000 SoC, Tam-bara等人 [33][34][35] 用α粒子和中子表征了极低功耗的SoC中SRAM的单 粒子效应 [39] . Badia等人 [40,41] 比较了质子辐照下不同资 源应用情况下GPU加速的SoC性能及其辐射敏感性, 资源应用越多, 单粒子效应截面越大; 大部分辐射错误 导致运行中止和系统重启.…”
Section: Arm®cortex™-a9处理器单元的质子辐照结果预估unclassified
“…To be able to evaluate safety-critical and HPC applications reliability on GPUs, the research community has been carefully employing both fault-injection [8]- [13] and beam experiments [3], [14]- [16]. While beam experiments provide a realistic analysis but lack fault propagation visibility, fault simulation allows complete observation of the fault propagation, but it is limited to a subset of the user-accessible resources.…”
Section: Motivation and Problems Addressedmentioning
confidence: 99%