Proceedings of the 53rd Annual Design Automation Conference 2016
DOI: 10.1145/2897937.2897966
|View full text |Cite
|
Sign up to set email alerts
|

A model-driven approach to warp/thread-block level GPU cache bypassing

Abstract: The high amount of memory requests from massive threads may easily cause cache contention and cache-miss-related resource congestion on GPUs. This paper proposes a simple yet effective performance model to estimate the impact of cache contention and resource congestion as a function of the number of warps/thread blocks (TBs) to bypass the cache. Then we design a hardware-based dynamic warp/thread-block level GPU cache bypassing scheme, which achieves 1.68x speedup on average on a set of memoryintensive benchma… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
6
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 17 publications
(6 citation statements)
references
References 31 publications
0
6
0
Order By: Relevance
“…As for GPGPU workloads, many more works have targeted cache locality to improve performance. There are many works in literature [9], [24], [29], [31], [32], [47], [50], [56], [57], [59], [61] that explore cache bypassing to improve GPU cache locality. Some works have targeted cache locality across kernel launches for parent-child kernels [52] or generic dependent kernels [16].…”
Section: Related Workmentioning
confidence: 99%
“…As for GPGPU workloads, many more works have targeted cache locality to improve performance. There are many works in literature [9], [24], [29], [31], [32], [47], [50], [56], [57], [59], [61] that explore cache bypassing to improve GPU cache locality. Some works have targeted cache locality across kernel launches for parent-child kernels [52] or generic dependent kernels [16].…”
Section: Related Workmentioning
confidence: 99%
“…However, on the GPU [15][16][17][18][19][20][21][22], the model based on the cache hit rate does not always perform well due to its unique architectural characteristics, including a lot of parallelisms, resource congestion, and memory divergence. A model-driven approach was developed by [23] which dynamically estimates the impact of cache contention and resource congestion as a function of the number of warps/thread blocks (TBs) to bypass the cache. Xie et al [17] proposed a compiler-based method to access or bypass the cache by analyzing reuse distance and memory traffic.…”
Section: Related Workmentioning
confidence: 99%
“…Prior work uses GPU modeling techniques to guide runtime optimizations (e.g., DVFS configuration [15] and cache missrelated optimizations [16]) or GPU resource scaling analysis [2]. Our work provides an accurate model for fast design space exploration.…”
Section: Related Workmentioning
confidence: 99%