Proceedings of the 2016 International Conference on Supercomputing 2016
DOI: 10.1145/2925426.2926253
|View full text |Cite
|
Sign up to set email alerts
|

Tag-Split Cache for Efficient GPGPU Cache Utilization

Abstract: Modern GPUs employ cache to improve memory system efficiency. However, large amount of cache space is underutilized due to irregular memory accesses and poor spatial locality which exhibited commonly in GPU applications. Our experiments show that using smaller cache lines could improve cache space utilization, but it also frequently suffers from significant performance loss by introducing large amount of extra cache requests. In this work, we propose a novel cache design named tag-split cache (TSC) that enable… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
3
3

Relationship

1
5

Authors

Journals

citations
Cited by 9 publications
(3 citation statements)
references
References 35 publications
0
3
0
Order By: Relevance
“…(B) Memory Divergence Frequency and Degree Significance. GPU memory divergence can significantly bottleneck performance, thus becomes a popular research topic in recent years [33,35,44,49,52]. It is also an important indicator on whether a program is well optimized for memory access.…”
Section: Case Studiesmentioning
confidence: 99%
See 1 more Smart Citation
“…(B) Memory Divergence Frequency and Degree Significance. GPU memory divergence can significantly bottleneck performance, thus becomes a popular research topic in recent years [33,35,44,49,52]. It is also an important indicator on whether a program is well optimized for memory access.…”
Section: Case Studiesmentioning
confidence: 99%
“…This is because modern GPUs have very limited L1 data cache and massively threaded GPU applications often exceed the L1 capacity, causing severe thrashing [24,43]. Additionally, cache-level resources (e.g., MSHR entries and load/store queues) are also very limited, often causing severe resource congestion (e.g., MSHR allocation failures) [30,32,33]. To tackle this problem, many architecture solutions are provided, e.g., enabling bypassing threshold in tag store [32] and proposing new bypassing policy [31].…”
Section: Case Studiesmentioning
confidence: 99%
“…Rhu et al [2013] proposed a locality-aware memory hierarchy that adaptively adjusts the memory access granularity to prevent overfetching, providing better off-chip bandwidth utilization. Furthermore, with regard to adaptive memory access granularity, Li et al [2016] proposed a tag-split cache to enable fine storage granularity to improve cache utilization while keeping a coarse access granularity to avoid excessive cache requests. proposed a scheme to tolerate memory miss latencies for SIMD cores by masking out threads in a warp that are waiting on data and allowing other threads to continue execution, hence utilizing the idle execution slots.…”
Section: Cache Managementmentioning
confidence: 99%