Warps and Atomics: Beyond Barrier Synchronization in the Verification of GPU Kernels

Bardsley, Ethel; Donaldson, Alastair F.

doi:10.1007/978-3-319-06200-6_18

Cited by 30 publications

(28 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The first stage of the C11 semantics translates a program into a set of executions called its basic set. 5 Each execution in this set is compatible with the instructions of the individual threads, but the set is constructed without considering the behaviour of shared memory, so it provides an over-approximation of the executions that will ultimately be allowed to happen once the whole program and the memory model are taken into account. For instance, the execution in Example 2 is a basic execution of the program in Example 1: the values of the write events correspond to the program text, but the values of the read events are arbitrary and the basic set of all executions ranges over all choices.…”

Section: C11 Executionsmentioning

confidence: 99%

See 1 more Smart Citation

Overhauling SC atomics in C11 and OpenCL

2016

Self Cite

View full text Add to dashboard Cite

Despite the conceptual simplicity of sequential consistency (SC), the semantics of SC atomic operations and fences in the C11 and OpenCL memory models is subtle, leading to convoluted prose descriptions that translate to complex axiomatic formalisations. We conduct an overhaul of SC atomics in C11, reducing the associated axioms in both number and complexity. A consequence of our simplification is that the SC operations in an execution no longer need to be totally ordered. This relaxation enables, for the first time, efficient and exhaustive simulation of litmus tests that use SC atomics. We extend our improved C11 model to obtain the first rigorous memory model formalisation for OpenCL (which extends C11 with support for heterogeneous many-core programming). In the OpenCL setting, we refine the SC axioms still further to give a sensible semantics to SC operations that employ a ‘memory scope’ to restrict their visibility to specific threads. Our overhaul requires slight strengthenings of both the C11 and the OpenCL memory models, causing some behaviours to become disallowed. We argue that these strengthenings are natural, and that all of the formalised C11 and OpenCL compilation schemes of which we are aware (Power and x86 CPUs for C11, AMD GPUs for OpenCL) remain valid in our revised models. Using the HERD memory model simulator, we show that our overhaul leads to an exponential improvement in simulation time for C11 litmus tests compared with the original model, making *exhaustive* simulation competitive, time-wise, with the *non-exhaustive* CDSChecker tool.

show abstract

Section: C11 Executionsmentioning

confidence: 99%

“…• the reads-from relation links write events to read events, such that every read observes exactly one write, and the locations and 5 This set is sometimes called the 'pre-executions' [7] or the 'opsems' [39].…”

Section: C11 Executionsmentioning

confidence: 99%

Overhauling SC atomics in C11 and OpenCL

2016

Self Cite

View full text Add to dashboard Cite

show abstract

“…Extensions of these methods support atomic operations to a limited extent [9,14], but neither provides a precise analysis accounting for weak behaviours. The CUDA-MEMCHECK [40] tool, provided with the CUDA SDK, dynamically checks for illegal memory accesses and data-races, but does not account for weak memory effects.…”

Section: Related Workmentioning

confidence: 99%

Exposing errors related to weak memory in GPU applications

Sorensen

Donaldson

2016

SIGPLAN Not.

Self Cite

View full text Add to dashboard Cite

We present the systematic design of a testing environment that uses stressing and fuzzing to reveal errors in GPU applications that arise due to weak memory effects. We evaluate our approach on seven GPUs spanning three Nvidia architectures, across ten CUDA applications that use fine-grained concurrency. Our results show that applications that rarely or never exhibit errors related to weak memory when executed natively can readily exhibit these errors when executed in our testing environment. Our testing environment also provides a means to help identify the root causes of such errors, and automatically suggests how to insert fences that harden an application against weak memory bugs. To understand the cost of GPU fences, we benchmark applications with fences provided by the hardening strategy as well as a more conservative, sound fencing strategy.

show abstract

“…To capture action repetition, the behavior of processes also can be described using a recursive definition, which must be paired with a contract. See for example the definition of process get_all in Listing 48 (lines [12][13][14][15].…”

Section: Reasoning With Historiesmentioning

confidence: 99%

“…Bardsley et al propose additional support in GPUVerify for reasoning about GPU kernels where warps and atomic operations are used for synchronisation [14]. In GPUVerify the user does not need to add specifications manually, because the tool internally speculates and refines kernel specifications [17].…”

Section: Conclusion and Related Workmentioning

confidence: 99%

Specification and verification of synchronisation classes in Java : a practical approach

Amighi¹

View full text Add to dashboard Cite

Warps and Atomics: Beyond Barrier Synchronization in the Verification of GPU Kernels

Cited by 30 publications

References 14 publications

Overhauling SC atomics in C11 and OpenCL

Overhauling SC atomics in C11 and OpenCL

Exposing errors related to weak memory in GPU applications

Specification and verification of synchronisation classes in Java : a practical approach

Contact Info

Product

Resources

About