IEEE International Symposium on High-Performance Comp Architecture 2012
DOI: 10.1109/hpca.2012.6168947
|View full text |Cite
|
Sign up to set email alerts
|

TAP: A TLP-aware cache management policy for a CPU-GPU heterogeneous architecture

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
93
0

Year Published

2014
2014
2020
2020

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 111 publications
(94 citation statements)
references
References 15 publications
1
93
0
Order By: Relevance
“…PDP-G and PDP-S achieve average IPC improvement of 44.6% and 45.4% respectively, which are close to that of PDP-P. PDP-S performs very similarly to PDP-P for most of the benchmarks. This is because GPU programs usually have similar behavior for all the threads [27], and the optimal PD estimated by one of the SIMT cores is probably also the optimal PD for the rest. In the following sections, we adopt the PDP-S design since it is the cheapest one.…”
Section: Cache Bypassing On Gpusmentioning
confidence: 99%
“…PDP-G and PDP-S achieve average IPC improvement of 44.6% and 45.4% respectively, which are close to that of PDP-P. PDP-S performs very similarly to PDP-P for most of the benchmarks. This is because GPU programs usually have similar behavior for all the threads [27], and the optimal PD estimated by one of the SIMT cores is probably also the optimal PD for the rest. In the following sections, we adopt the PDP-S design since it is the cheapest one.…”
Section: Cache Bypassing On Gpusmentioning
confidence: 99%
“…Although higher TLP provides better latency hiding capability, it has been observed that increased TLP sometimes may hurt the performance due to the cache contention problem [22]. To address this performance anomaly, we proposed to either use the dynamic SMdueling approach [14] or a simple static threshold to limit the number of active warps. More details are discussed in Section 6.1.…”
Section: Figure 14 the Workload Buffermentioning
confidence: 99%
“…Prior efforts [8,10,15,16,17,26,29,30,31] demonstrate that horizontal partitioning on memory or LLC is effective in eliminating inter-program interference and improving performance. With vertical partitioning and, more generally, our partitioning policy space, one important question is whether the benefits from the horizontal memory and cache partitioning can be accumulated (i.e., should we go vertical in partitioning?…”
Section: Going Vertical?mentioning
confidence: 99%
“…In particular, Qureshi et al [32] design a utility-based cache partitioning scheme that allocates appropriate cache resources based on application miss rate monitored through dedicated hardware. More recently, cache partitioning is also adopted in heterogeneous GPU-CPU architectures to promote fair resource sharing among CPU and GPU applications [30], which exhibit drastically different memory access characteristics. Other efforts [3,9,12,15,25,28] classify workloads based on hardware profiling, and then choose appropriate scheduling policies for different classifications.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation