2014
DOI: 10.1007/978-3-319-06200-6_18
|View full text |Cite
|
Sign up to set email alerts
|

Warps and Atomics: Beyond Barrier Synchronization in the Verification of GPU Kernels

Abstract: Abstract. We describe the design and implementation of methods to support reasoning about data races in GPU kernels where constructs other than the standard barrier primitive are used for synchronization. At one extreme we consider kernels that exploit implicit, coarse-grained synchronization between threads in the same warp, a feature provided by many architectures. At the other extreme we consider kernels that reduce or avoid barrier synchronization through the use of atomic operations. We discuss design dec… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
28
0

Year Published

2014
2014
2023
2023

Publication Types

Select...
4
3
1

Relationship

3
5

Authors

Journals

citations
Cited by 30 publications
(28 citation statements)
references
References 14 publications
0
28
0
Order By: Relevance
“…The first stage of the C11 semantics translates a program into a set of executions called its basic set. 5 Each execution in this set is compatible with the instructions of the individual threads, but the set is constructed without considering the behaviour of shared memory, so it provides an over-approximation of the executions that will ultimately be allowed to happen once the whole program and the memory model are taken into account. For instance, the execution in Example 2 is a basic execution of the program in Example 1: the values of the write events correspond to the program text, but the values of the read events are arbitrary and the basic set of all executions ranges over all choices.…”
Section: C11 Executionsmentioning
confidence: 99%
See 1 more Smart Citation
“…The first stage of the C11 semantics translates a program into a set of executions called its basic set. 5 Each execution in this set is compatible with the instructions of the individual threads, but the set is constructed without considering the behaviour of shared memory, so it provides an over-approximation of the executions that will ultimately be allowed to happen once the whole program and the memory model are taken into account. For instance, the execution in Example 2 is a basic execution of the program in Example 1: the values of the write events correspond to the program text, but the values of the read events are arbitrary and the basic set of all executions ranges over all choices.…”
Section: C11 Executionsmentioning
confidence: 99%
“…• the reads-from relation links write events to read events, such that every read observes exactly one write, and the locations and 5 This set is sometimes called the 'pre-executions' [7] or the 'opsems' [39].…”
Section: C11 Executionsmentioning
confidence: 99%
“…Extensions of these methods support atomic operations to a limited extent [9,14], but neither provides a precise analysis accounting for weak behaviours. The CUDA-MEMCHECK [40] tool, provided with the CUDA SDK, dynamically checks for illegal memory accesses and data-races, but does not account for weak memory effects.…”
Section: Related Workmentioning
confidence: 99%
“…To capture action repetition, the behavior of processes also can be described using a recursive definition, which must be paired with a contract. See for example the definition of process get_all in Listing 48 (lines [12][13][14][15].…”
Section: Reasoning With Historiesmentioning
confidence: 99%
“…Bardsley et al propose additional support in GPUVerify for reasoning about GPU kernels where warps and atomic operations are used for synchronisation [14]. In GPUVerify the user does not need to add specifications manually, because the tool internally speculates and refines kernel specifications [17].…”
Section: Conclusion and Related Workmentioning
confidence: 99%