2021
DOI: 10.1007/s11227-021-03751-2
|View full text |Cite
|
Sign up to set email alerts
|

DYRE: a DYnamic REconfigurable solution to increase GPGPU’s reliability

Abstract: General-purpose graphics processing units (GPGPUs) are extensively used in high-performance computing. However, it is well known that these devices’ reliability may be limited by the rising of faults at the hardware level. This work introduces a flexible solution to detect and mitigate permanent faults affecting the execution units in these parallel devices. The proposed solution is based on adding some spare modules to perform two in-field operations: detecting and mitigating faults. The solution takes advant… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
2
1
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(1 citation statement)
references
References 27 publications
0
1
0
Order By: Relevance
“…Their higher complexity and the huge amount of computing units make GPU hardening more challenging than for CPUs. Some GPU mitigation solutions based on Built-In Self-Repair (BISR), exploiting spare modules to replace faulty units, have also been proposed [29]- [31]. Furthermore, some authors proposed the reconfiguration of computational modules [32], [33] and memories [34] in GPUs once a fault is detected.…”
Section: B Mitigation Strategiesmentioning
confidence: 99%
“…Their higher complexity and the huge amount of computing units make GPU hardening more challenging than for CPUs. Some GPU mitigation solutions based on Built-In Self-Repair (BISR), exploiting spare modules to replace faulty units, have also been proposed [29]- [31]. Furthermore, some authors proposed the reconfiguration of computational modules [32], [33] and memories [34] in GPUs once a fault is detected.…”
Section: B Mitigation Strategiesmentioning
confidence: 99%