Proceedings of the 29th ACM on International Conference on Supercomputing 2015
DOI: 10.1145/2751205.2751232
|View full text |Cite
|
Sign up to set email alerts
|

Fine-Grained Synchronizations and Dataflow Programming on GPUs

Abstract: The last decade has witnessed the blooming emergence of many-core platforms, especially the graphic processing units (GPUs). With the exponential growth of cores in GPUs, utilizing them efficiently becomes a challenge. The dataparallel programming model assumes a single instruction stream for multiple concurrent threads (SIMT); therefore little support is offered to enforce thread ordering and finegrained synchronizations. This becomes an obstacle when migrating algorithms which exploit fine-grained parallelis… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
13
0

Year Published

2016
2016
2021
2021

Publication Types

Select...
3
3
2

Relationship

3
5

Authors

Journals

citations
Cited by 41 publications
(16 citation statements)
references
References 21 publications
0
13
0
Order By: Relevance
“…Li, et al [24] propose a lightweight scratchpad memory lock design in software for older Nvidia GPUs (Fermi and Kepler) that uses software atomics for scratchpad memories. Their solution improves local (i.e.…”
Section: Gpu Solutionsmentioning
confidence: 99%
“…Li, et al [24] propose a lightweight scratchpad memory lock design in software for older Nvidia GPUs (Fermi and Kepler) that uses software atomics for scratchpad memories. Their solution improves local (i.e.…”
Section: Gpu Solutionsmentioning
confidence: 99%
“…Li et al have provided a solution that mandates the programmer to responsibly handle this deadlock by preventing illegal accesses to locked locations in the main storage. This is achieved by using the lock bits appropriately as the lock unit is not configured to track ownership of locks.…”
Section: Deadlocksmentioning
confidence: 99%
“…Synchronization remains a performance bottleneck for many applications and has long been a classic problem in computer systems research [7,18,24,34,35]. To evaluate the synchronization cost in SpTRSV, we run a parallel SpTRSV implemented by Park et al [33] based on the aforementioned level-set approach.…”
Section: Motivation For Avoiding Synchronizationmentioning
confidence: 99%