Fractional GPUs: Software-Based Compute and Memory Bandwidth Reservation for GPUs

Jain, Saksham; Baek, Iljoo; Wang, Shige; Rajkumar, Ragunathan

doi:10.1109/rtas.2019.00011

Cited by 52 publications

(29 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Instead, those variations are caused by the initial core state (e.g. branch predictor state), or changes in instruction cache behavior due to changes in the memory alignment of the binaries with and No [12], [28], [31], [32], [4], [25], [33], [34] GPU Partially [2] No [7], [19], [38], [39] without thread redundancy. In the case of FAC benchmark, since it is a small benchmark (around 700 instructions only), these tiny effects have a visible impact in relative terms (e.g.…”

Section: B Execution Time Overheadmentioning

confidence: 99%

SafeDE: a flexible Diversity Enforcement hardware module for light-lockstepping

Bas

Alcaide

Lorenzo

et al. 2021

2021 IEEE 27th International Symposium on on-Line Testing and Robust System Design (IOLTS)

View full text Add to dashboard Cite

Safety-related systems, such as those in automotive, avionics and space, impose the existence of appropriate safety measures to meet the safety requirements of the system. In the case of the highest integrity level functionalities (e.g. ASIL-D in automotive), diverse redundancy must be deployed to avoid unreasonable risk of a single fault leading the system to a failure (e.g. using lockstepped cores). However, existing lockstep solutions are either (1) highly intrusive and inflexible coupling two cores with hardware means, or (2) costly in terms of execution time and monitoring if a software monitor thread checks that cores running redundantly preserve sufficient staggering.This paper presents SafeDE, a Diversity Enforcement hardware module providing light-lockstep support by means of a non-intrusive and flexible hardware module that preserves staggering across cores running redundant threads, thus bringing time diversity. SafeDE reconciles the lightness and flexibility of software-only solutions, even allowing using the cores without any lockstepping, as well as the tighter staggering of hardware-only solutions that allow using staggering values of few cycles, instead of hundreds of microseconds, as for software-only solutions. Our integration of SafeDE in a RISC-V FPGA-based space multicore from Cobham Gaisler shows that staggering is effectively preserved, and SafeDE overheads are negligible in terms of area and performance due to staggering. 1 Available as an open-source component in https://bsccaos.github.io [6].

show abstract

Section: B Execution Time Overheadmentioning

confidence: 99%

SafeDE: a flexible Diversity Enforcement hardware module for light-lockstepping

Bas

Alcaide

Lorenzo

et al. 2021

2021 IEEE 27th International Symposium on on-Line Testing and Robust System Design (IOLTS)

View full text Add to dashboard Cite

show abstract

“…No [18], [19], [20], [21], [22], [23], [24], [25] GPU Partially [3] No [26], [27], [28], [29] [30] implement DCLS, whereas some Arm Cortex-R5 designs implement Triple-Core Lockstep [5], but fail to provide enough performance for AD systems [31]. Some improvements shorten time-to-detection for errors [32] or enhance recovery processes [33], but do not improve performance.…”

Section: Our Approachmentioning

confidence: 99%

“…Analogous solutions for accelerators (e.g. GPUs or the Kalray MPPA family) have been proposed, either with hardware support [27], [15], [16], [17] or with software-only support [26], [27], [28], but none of them guarantees diversity to protect against CCFs. Some preliminary solutions guarantee diversity to some extent for GPUs either with [13] or without hardware support [3].…”

Section: Our Approachmentioning

confidence: 99%

Software-only based Diverse Redundancy for ASIL-D Automotive Applications on Embedded HPC Platforms

Alcaide

Kosmidis

Hernández

et al. 2020

2020 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT)

View full text Add to dashboard Cite

High-Performance Computing (HPC) platforms become a must in automotive systems to enable autonomous driving. However, automotive platforms must avoid Common Cause Failures (CCFs), as indicated by the ISO26262 automotive safety standard. CCFs can be avoided enforcing diverse redundancy. Unfortunately, HPC platforms fail to provide such support. This paper proposes a flexible and efficient software-based scheme to implement diverse redundancy on HPC platforms. A software implementation on a Commercial Off-The-Shelf ARM multicore proves the effectiveness of this scheme to guarantee diverse redundancy with negligible performance degradation. Our solution is the first step towards an automotive-compliant HPC platform.

show abstract

“…In [39], the authors show how to partition GPU memory resources, including cache and main memory, to enforce strong isolation between concurrent kernels. However, the approach is highly platform-specific, requiring a great deal of reverse engineering, it is focused on discreet GPUs rather than integrated CPU-GPU SoCs, and does not protect the GPU from CPU interference.…”

Section: Memory-aware Framework On Gpumentioning

confidence: 99%

“…We further assume that only one GPU kernel is executed at a time. While recent work has shown that co-scheduling multiple kernels can improve GPU resource utilization [76,39], it also complicates the issue of timing analysis. For this reason, we reserve such an extension to future work.…”

Section: System Model and Assumptionsmentioning

confidence: 99%