2014
DOI: 10.1109/mm.2014.4
|View full text |Cite
|
Sign up to set email alerts
|

Cache Coherence for GPU Architectures

Abstract: While scalable coherence has been extensively studied in the context of general purpose chip multiprocessors (CMPs), GPU architectures present a new set of challenges.Introducing conventional directory protocols adds unnecessary coherence traffic overhead to existing GPU applications. Moreover, these protocols increase the verification complexity of the GPU memory system. Recent research, Library Cache Coherence (LCC) [34,54], explored the use of time-based approaches in CMP coherence protocols.This paper desc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
55
0

Year Published

2014
2014
2020
2020

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 39 publications
(56 citation statements)
references
References 40 publications
1
55
0
Order By: Relevance
“…GPU L1 caches typically feature a write-through policy, with [1] or without [33], [35] write-allocation. This policy saves bandwidth compared to a write-back policy [41], [16], since GPU applications have very little reuse on written data. The L2 cache is write-back with write-allocation, which is the same design choice as a conventional CPU LLC.…”
Section: Baseline Gpu Architecturementioning
confidence: 99%
See 2 more Smart Citations
“…GPU L1 caches typically feature a write-through policy, with [1] or without [33], [35] write-allocation. This policy saves bandwidth compared to a write-back policy [41], [16], since GPU applications have very little reuse on written data. The L2 cache is write-back with write-allocation, which is the same design choice as a conventional CPU LLC.…”
Section: Baseline Gpu Architecturementioning
confidence: 99%
“…The L2 cache is write-back with write-allocation, which is the same design choice as a conventional CPU LLC. Modern GPUs typically do not provide hardware support for L1 cache coherence to avoid the overhead that coherence messages add to NoC traffic and memory access latency [41], [16]. Current GPU L2 caches do not enforce inclusion.…”
Section: Baseline Gpu Architecturementioning
confidence: 99%
See 1 more Smart Citation
“…One could envision scoped transactions. Singh et al proposed temporal coherence, which is a time-based self-invalidation coherence protocol for GPUs [27]. Scopes could potentially be applied to temporal coherence to reduce self-invalidations.…”
Section: Related Workmentioning
confidence: 99%
“…This is a time-based coherence protocol for GPUs, namely Temporal Coherence (TC) [80], based on globally synchronized counters. With TC-Strong, these synchronized counters are maintained in the GPU cores and L2 controllers, allowing to self-invalidate cache blocks and maintain coherence, thus eliminating coherence traffic, and reducing are overhead and protocol complexity.…”
Section: Timestamps-based Coherencementioning
confidence: 99%