2020 IEEE Computer Society Annual Symposium on VLSI (ISVLSI) 2020
DOI: 10.1109/isvlsi49217.2020.00076
|View full text |Cite
|
Sign up to set email alerts
|

Analyzing the Sensitivity of GPU Pipeline Registers to Single Events Upsets

Abstract: Graphics processing units are available solutions for high-performance safety-critical applications, such as selfdriving cars. In this application domain, functional-safety and reliability are major concerns. Thus, the adoption of fault tolerance techniques is mandatory to detect or correct faults, since these devices must work properly, even when faults are present. GPUs are designed and implemented with cutting-edge technologies, which makes them sensitive to faults caused by radiation interference, such as … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(3 citation statements)
references
References 20 publications
0
3
0
Order By: Relevance
“…In contrast, the low-level micro-architectural evaluation can provide detailed information about the identification of vulnerable locations on most internal modules. In [20], [21], and [22], the authors proposed some reliability evaluation methodologies based on microarchitectural evaluation on RTlevel descriptions. In these works, fine-grain evaluation mainly uses two approaches to identify vulnerable locations: i) exhaustive evaluation of all sites in the modules, and ii) statistical samples of the sites in the modules.…”
Section: A Methodologies For Reliability Evaluationmentioning
confidence: 99%
“…In contrast, the low-level micro-architectural evaluation can provide detailed information about the identification of vulnerable locations on most internal modules. In [20], [21], and [22], the authors proposed some reliability evaluation methodologies based on microarchitectural evaluation on RTlevel descriptions. In these works, fine-grain evaluation mainly uses two approaches to identify vulnerable locations: i) exhaustive evaluation of all sites in the modules, and ii) statistical samples of the sites in the modules.…”
Section: A Methodologies For Reliability Evaluationmentioning
confidence: 99%
“…-Detection with redundancy without diversity [163,6,191,178,199,57,73,138,99,164,34,170,120,140,77,127,139] and with diversity [13,14,8,12] -Detection and/or correction with coding (e.g., ECC) and checkers [57,120,169,139,108,124,125,140,132] -Recovery with re-execution or checkpoints [71,175,180,135,115,124] -Mitigation with shielding and reconfiguration [153,127,91,193] • Application-dependent:…”
Section: Random Hw Failuresmentioning
confidence: 99%
“…Processing Units Due to the limited public documentation, controllability and observability of SM architectures, few works target the reliability of SMs explicitly. In general, works attempt to improve SMs (and other components simultaneously) employing different software-based redundancy techniques for the whole SM, its cores only, or parts of those cores (e.g., pipeline registers [163]) using available underutilized resources in order to reduce the computing overhead [6,191,178,199,57,73]. Some authors combine software redundancy with diversity to mitigate common cause failures by, for instance, making redundant threads execute with some staggering in different cores [13,14,8].…”
Section: Componentsmentioning
confidence: 99%