Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture 2011
DOI: 10.1145/2155620.2155655
|View full text |Cite
|
Sign up to set email alerts
|

Hardware transactional memory for GPU architectures

Abstract: Graphics Processing Units (GPUs) have become the accelerator of choice for data-parallel applications, enabling the execution of thousands of threads in a Single Instruction-Multiple Thread (SIMT) fashion. Using OpenCL terminology, GPUs offer a global memory space shared by all the threads in the GPU, as well as a low-latency local memory space shared by a subset of the threads. The latter is used as a scratchpad to improve the performance of the applications. We propose GPU-LocalTM, a hardware transactional m… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
35
0

Year Published

2013
2013
2021
2021

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 87 publications
(35 citation statements)
references
References 55 publications
0
35
0
Order By: Relevance
“…Scheduling: Lindholm et al [30] suggest that the warp scheduler used in NVIDIA GPUs has zero-cycle overhead, and warps can be scheduled according to their pre- [10] tCL = 10, tRP = 10, tRC = 35, tRAS = 25, tRCD = 12, tRRD = 8, tCDLR = 6, tW R = 11 determined priorities. Since the difference between PA and TL schedulers is primarily in the fetch group formation approach, the hardware overhead of our proposal is similar to that of the TL scheduler.…”
Section: Hardware Overheadmentioning
confidence: 99%
“…Scheduling: Lindholm et al [30] suggest that the warp scheduler used in NVIDIA GPUs has zero-cycle overhead, and warps can be scheduled according to their pre- [10] tCL = 10, tRP = 10, tRC = 35, tRAS = 25, tRCD = 12, tRRD = 8, tCDLR = 6, tW R = 11 determined priorities. Since the difference between PA and TL schedulers is primarily in the fetch group formation approach, the hardware overhead of our proposal is similar to that of the TL scheduler.…”
Section: Hardware Overheadmentioning
confidence: 99%
“…Storing copies of a few registers at transaction threads in a CPU core is relatively cheap. For GPUs, however, with thousands of threads running, naively check-pointing large register files would incur significant overhead [Fung et al 2011]. Therefore, it is not practical to use traditional CPU check-pointing mechanisms on the GPU.…”
Section: Paragon Overviewmentioning
confidence: 99%
“…Recent works [Cederman et al 2010;Fung et al 2011] proposed software and hardware transactional memory systems for graphic engines. In these works, each thread is a transaction, and if a transaction aborts, it needs to re-execute.…”
Section: Related Workmentioning
confidence: 99%
“…In addition to traditional CPUs architecture, the MPP architectures, such as GPU plays an important role of cooperator performing a high intensive computing. When we port the dynamic memory allocation from CPU to the massively parallel cores environment, it may suffer some performance issues in massive cores such like latency of memory transaction or threads synchronization that can be a bottleneck and reduce total computing power [1]. Therefore, we need a suitable memory management to handle dynamic memory allocation on MPP architecture.…”
Section: Introductionmentioning
confidence: 99%