Proceedings of the 2016 International Conference on Parallel Architectures and Compilation 2016
DOI: 10.1145/2967938.2967951
|View full text |Cite
|
Sign up to set email alerts
|

Combating the Reliability Challenge of GPU Register File at Low Supply Voltage

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
16
0
8

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 27 publications
(24 citation statements)
references
References 31 publications
0
16
0
8
Order By: Relevance
“…[4] extends aggressive undervolting to multi-core CPUs and [18] leveraged built-in Error-Correcting Code (ECC) technique to detect and mitigate faults in Intel Itanium II. • GPUs: As an example of commercial GPUs, [44] studied this approach in GPU register files and proposed an architectural solution that leverages long register dead time to enable reliable operations from unreliable register file at low voltages. • ASICs: As an example of ASICs, [45] evaluated the Floating Point Units (FPUs) under timing violations and accordingly, presented a bit-level fault model.…”
Section: A Aggressive Undervolting Technique On Real Hardwarementioning
confidence: 99%
“…[4] extends aggressive undervolting to multi-core CPUs and [18] leveraged built-in Error-Correcting Code (ECC) technique to detect and mitigate faults in Intel Itanium II. • GPUs: As an example of commercial GPUs, [44] studied this approach in GPU register files and proposed an architectural solution that leverages long register dead time to enable reliable operations from unreliable register file at low voltages. • ASICs: As an example of ASICs, [45] evaluated the Floating Point Units (FPUs) under timing violations and accordingly, presented a bit-level fault model.…”
Section: A Aggressive Undervolting Technique On Real Hardwarementioning
confidence: 99%
“…Characterizing these faults can allow better power and reliability trade-offs, without performance degradation, as is for DVFS approach. Among the real hardware devices, this approach is extensively studied for modern processors [10], [11], [12], [13]; however, there are several recent efforts on other hardware devices, as well, i.e., GPUs [4], ASICs [14], [15], and memory systems [5], [16]. In parallel, several simulationbased framework [17] or design optimization [18], [19] are also proposed to study undervolting through nano-meter technology parameters; however, it is evident that this approach lacks the exact information of the fault model under very lowvoltage operations and their validation on the silicon remains a key question.…”
Section: Related Workmentioning
confidence: 99%
“…Therefore, characterization of these undervolting faults and understanding their behavior is critical to mitigate their impact. Although, there have been some previous undervolting works on CPUs [3], Graphic Processor Units (GPUs) [4], and Dynamic RAM (DRAM) memories [5], there are no "deep-dive" undervolting fault characterization studies due to the relatively closed nature of these hardware substrates where the vendors expose few details. In comparison, the relatively open Field Programmable Gate Array (FPGA) architectures make it possible to conduct and report such detailed studies.…”
Section: Introductionmentioning
confidence: 99%
“…Although Gebhart et al [7] propose an unified on-chip memory structure that the capacity of register file, scratchpad memory, and L1 cache can be partitioned at runtime according to the requirement of applications in a fine-grained way, there are still two shortcomings. First, the unified structure lacks flexibility; register file is one of the main contributors to GPU energy consumption and various power saving technologies [11,14,23,[32][33][34] are proposed for register file to save energy, which can be hard to apply to the unified structure due to the different access characteristics between register file and L1 cache. Second, the unified structure increases bank conflicts between register file, scratchpad memory and L1 cache; they use software-managed hierarchical register file [6] to reduce the required bandwidth to the main register file, however, that technology focuses on energy efficiency and may lead to resource underutilization and suboptimal performance [29,35].…”
Section: Evaluation For Advanced Architecturementioning
confidence: 99%