Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture 2013
DOI: 10.1145/2540708.2540742
|View full text |Cite
|
Sign up to set email alerts
|

Efficient management of last-level caches in graphics processors for 3D scene rendering workloads

Abstract: Three-dimensional (3D) scene rendering is implemented in the form of a pipeline in graphics processing units (GPUs). In different stages of the pipeline, different types of data get accessed. These include, for instance, vertex, depth, stencil, render target (same as pixel color), and texture sampler data. The GPUs traditionally include small caches for vertex, render target, depth, and stencil data as well as multi-level caches for the texture sampler units. Recent introduction of reasonably large last-level … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
8
0

Year Published

2014
2014
2023
2023

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 22 publications
(8 citation statements)
references
References 42 publications
0
8
0
Order By: Relevance
“…They contain a wide range of applications that fall into various research categories. The selected applications in Table 2 are also used in the previous studies [4,10,16, 23ś26, 28, 30ś 33, 42ś45, 53ś55].…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…They contain a wide range of applications that fall into various research categories. The selected applications in Table 2 are also used in the previous studies [4,10,16, 23ś26, 28, 30ś 33, 42ś45, 53ś55].…”
Section: Discussionmentioning
confidence: 99%
“…NVIDIA provides its own tools to support profiling CUDA code, such as Visual Profiler (NVP) [16], nvprof [1], and NSight [12]. These profilers collect performance data via hardware performance counters and lightweight binary instrumentation.…”
Section: Related Workmentioning
confidence: 99%
“…It also employs a reactive bypassing scheme. Some work focuses on improving GPU cache performance through novel cache replacement methods [6,7,12,31,36,37]. A decoupled GPU L1 cache is proposed in [16] to enable dynamic locality filtering functionality in the extended tag store for efficient and accurate runtime cache bypassing.…”
Section: Related Workmentioning
confidence: 99%
“…A unified GPU on-chip memory design is proposed by Gebhart et al [14] to satisfy varying capacity needs across different applications. LLC management policies for 3D scene rendering workloads on GPUs are explored by Gaur et al [13], while our work focuses on general purpose applications. Some other work studied cache management schemes for heterogeneous systems [27], [31].…”
Section: B Gpu Cache Managementmentioning
confidence: 99%