2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2018
DOI: 10.1109/ipdps.2018.00025
|View full text |Cite
|
Sign up to set email alerts
|

CIAO: Cache Interference-Aware Throughput-Oriented Architecture and Scheduling for GPUs

Abstract: A modern GPU aims to simultaneously execute more warps for higher Thread-Level Parallelism (TLP) and performance. When generating many memory requests, however, warps contend for limited cache space and thrash cache, which in turn severely degrades performance. To reduce such cache thrashing, we may adopt cache locality-aware warp scheduling which gives higher execution priority to warps with higher potential of data locality. However, we observe that warps with high potential of data locality often incurs far… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(1 citation statement)
references
References 27 publications
0
1
0
Order By: Relevance
“…Effective utilization of intra-and inter-warp data locality can improve on-chip cache hit rate, mitigate cache interference [6], reduce the number of costly off-chip accesses, and improve GPU performance [7]. The traditional LRR (Loose Round Robin) and GTO (Greedy Then Oldest) scheduling algorithms preserve inter-warp locality and intrawarp locality, respectively.…”
Section: Introductionmentioning
confidence: 99%
“…Effective utilization of intra-and inter-warp data locality can improve on-chip cache hit rate, mitigate cache interference [6], reduce the number of costly off-chip accesses, and improve GPU performance [7]. The traditional LRR (Loose Round Robin) and GTO (Greedy Then Oldest) scheduling algorithms preserve inter-warp locality and intrawarp locality, respectively.…”
Section: Introductionmentioning
confidence: 99%