Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applicatio 2016
DOI: 10.1145/2983990.2984032
|View full text |Cite
|
Sign up to set email alerts
|

Portable inter-workgroup barrier synchronisation for GPUs

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
3
2
2

Relationship

1
6

Authors

Journals

citations
Cited by 21 publications
(3 citation statements)
references
References 26 publications
0
3
0
Order By: Relevance
“…In fact, Memalloy can also be used to check libraries under weak memory. Prior work has (manually) verified that stack, queue, and barrier libraries implement their specifications under weak memory models [8,52]; here we show how checking these types of properties can be automated up to a bounded number of library and client events. We see this as a straightforward first-step towards a general verification effort.…”
Section: Checking Metatheoretical Propertiesmentioning
confidence: 86%
“…In fact, Memalloy can also be used to check libraries under weak memory. Prior work has (manually) verified that stack, queue, and barrier libraries implement their specifications under weak memory models [8,52]; here we show how checking these types of properties can be automated up to a bounded number of library and client events. We see this as a straightforward first-step towards a general verification effort.…”
Section: Checking Metatheoretical Propertiesmentioning
confidence: 86%
“…This method is not portable as the number of launched work-groups depends on the device. Sorensen et al [22] extended it to be portable by discovering work-group occupancy dynamically. Their implementation of inter work-group barrier synchronisation is useful when the developer knows there is interaction between work-groups that needs to be synchronised.…”
Section: Related Workmentioning
confidence: 99%
“…A problem is the synchronization of processing steps over the total amount of data, which might be much larger than the number of work items that can actually run in parallel. Sorensen et al [39] discuss these limitations and propose an inter-workgroup barrier that synchronizes all threads running in parallel by estimating the occupancy, i.e., the number of work items that can run in parallel, and building a barrier using OpenCL atomics.…”
Section: Nested Parallelismmentioning
confidence: 99%