Proximity coherence for chip multiprocessors

Barrow-Williams, Nick; Fensch, Christian; Moore, Simon W.

doi:10.1145/1854273.1854293

Cited by 20 publications

(6 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The research works most closely related to CCM are victim or replication strategies [6,12,16,20,26,31,32], Cooperative Caching strategies [1,5,10,11,13,17,18], and to a lesser extent, hierarchical directory coherence [8,14,15,22,33,34].…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Cluster Cache Monitor: Leveraging the Proximity Data in CMP

Temam

Liu

et al. 2014

Int J Parallel Prog

View full text Add to dashboard Cite

As the number of cores and the working sets of parallel workloads increase, shared L2 caches exhibit fewer misses than private L2 caches by making a better use of the total available cache capacity, but they also induce higher overall L1 miss latencies because of the longer average distance between two nodes, and the potential congestions at certain nodes. One of the main causes of the long L1 miss latencies are accesses to home nodes of the directory. However, we have observed that there is a high probability that the target data of an L1 miss resides in the L1 cache of a neighbor node. In such cases, these long-distance accesses to the home nodes can be potentially avoided. We organize the multi-core into clusters of 2 × 2 nodes, and in order to leverage the aforementioned property, we introduce the Cluster Cache Monitor (CCM). The CCM is a hardware structure in charge of detecting whether an L1 miss can be served by one of the cluster L1 caches, and two cluster-related states are added in the coherence protocol in order to avoid long-distance accesses to home nodes upon hits in the cluster L1 caches. We evaluate this approach on a 64-node multi-core using SPLASH-2 and PARSEC benchmarks, and we find that the CCM can reduce the execution time by 15 % and reduce the energy by 14 %, while saving 28 % of the directory storage area compared to a standard multi-core with a shared L2. We also show that the CCM outperforms recent mechanisms, such as ASR, DCC and RNUCA.

show abstract

Section: Related Workmentioning

confidence: 99%

“…CCM requires only marginal modifications of the network interface. Barrow [5] leverages the proximity data by sending requesting messages to all neighbors, which complicates the coherence protocol. Acacio [1] and Hossain [18] also introduces the concept of using the nearby data.…”

Section: Leveraging Data Proximitymentioning

confidence: 99%

Cluster Cache Monitor: Leveraging the Proximity Data in CMP

Temam

Liu

et al. 2014

Int J Parallel Prog

View full text Add to dashboard Cite

show abstract

“…Cache coherence designs to exploit the proximity of data sharers have been proposed in [6,7]. Williams et.…”

Section: Related Work and Conclusionmentioning

confidence: 99%

“…al. [7] propose to add direct links in four directions of NoC routers to snoop sharers in direct neighbors. However, their scheme depends on specific application mapping to work and has more hardware overhead.…”

Section: Related Work and Conclusionmentioning

confidence: 99%

“…As a result, if the bus configurations are fixed as in [10] and [7], the effectiveness of the snooping will be compromised since the number of possible shares searched is not related to the bus length. In another word, even if we increase the length of snooping buses, the sharers found may not increase accordingly.…”

Section: B1 Mapping Of Parallel Programs Onto a Cmp Platformmentioning

confidence: 99%

See 1 more Smart Citation

A hybrid NoC design for cache coherence optimization for chip multiprocessors

Hui

Jang

Ding

et al. 2012

Proceedings of the 49th Annual Design Automation Conference

View full text Add to dashboard Cite

On chip many-core systems, evolving from prior multi-pro cessor systems, are considered as a promising solution to the performance scalability and power consumption problems. The long communication distance between the traditional multi-processors makes directory-based cache coherence protocols better solutions compared to bus-based snooping protocols even with the overheads from indirections. However, much smaller distances between the CMP cores enhance the reachability of buses, revitalizing the applicability of snooping protocols for cache-to-cache transfers. In this work, we propose a hybrid NoC design to provide optimized support for cache coherency. In our design, on-chip links can be dynamically configured as either point-to-point links between NoC nodes or short buses to facilitate localized snooping. By taking advantage of the best of both worlds, bus-based snooping coherency and NoC-based directory coherency, our approach brings both power and performance benefits.

show abstract

Improving performance through deep value profiling and specialization with code transformation

Khan

2011

Computer Languages, Systems & Structures

View full text Add to dashboard Cite

Proximity coherence for chip multiprocessors

Cited by 20 publications

References 24 publications

Cluster Cache Monitor: Leveraging the Proximity Data in CMP

Cluster Cache Monitor: Leveraging the Proximity Data in CMP

A hybrid NoC design for cache coherence optimization for chip multiprocessors

Improving performance through deep value profiling and specialization with code transformation

Contact Info

Product

Resources

About