High-throughput coherence control and hardware messaging in Everest

Nanda, Ashwini K.; Nguyen, A.-T.; Michael, M. M.; Joseph, Douglas J.

doi:10.1147/rd.452.0299

Cited by 15 publications

(10 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Similar to the remote cache shadow directory used in [25], the CCE's directory memory is organized as a duplication of all private caches' tag arrays. Figure 3 shows the directory for an 8-core CMP, each core having 2-way associative L1 I/D split caches and a 4-way associative L2 cache.…”

Section: Central Coherence Engine (Cce)mentioning

confidence: 99%

Cooperative Caching for Chip Multiprocessors

Chang

Sohi

2006

SIGARCH Comput. Archit. News

175

116

View full text Add to dashboard Cite

This paper presents CMP Cooperative Caching, a unified framework to manage a CMP's aggregate on-chip cache resources. Cooperative caching combines the strengths of private and shared cache organizations by forming an aggregate "shared" cache through cooperation among private caches. Locally active data are attracted to the private caches by their accessing processors to reduce remote on-chip references, while globally active data are cooperatively identified and kept in the aggregate cache to reduce off-chip accesses. Examples of cooperation include cache-to-cache transfers of clean data, replication-aware data replacement, and global replacement of inactive data. These policies can be implemented by modifying an existing cache replacement policy and cache coherence protocol, or by the new implementation of a directory-based protocol presented in this paper.Our evaluation using full-system simulation shows that cooperative caching achieves an off-chip miss rate similar to that of a shared cache, and a local cache hit rate similar to that of using private caches. Cooperative caching performs robustly over a range of system/cache sizes and memory latencies. For an 8-core CMP with 1MB L2 cache per core, the best cooperative caching scheme improves the performance of multithreaded commercial workloads by 5-11% compared with a shared cache and 4-38% compared with private caches. For a 4-core CMP running multiprogrammed SPEC2000 workloads, cooperative caching is on average 11% and 6% faster than shared and private cache organizations, respectively.

show abstract

Section: Central Coherence Engine (Cce)mentioning

confidence: 99%

Cooperative Caching for Chip Multiprocessors

Chang

Sohi

2006

SIGARCH Comput. Archit. News

175

116

View full text Add to dashboard Cite

show abstract

“…The lookup, insertion and deletion of spilling buffer entries will be discussed later (in Section 3.3.2). The CCE's directory memory is organized as a duplication of all private caches' tag arrays, similar to [112]'s remote cache shadow directory. Because CC requires the private caches to be non-inclusive, the CCE has to duplicate the tags for all cache levels.…”

Section: Centralized On-chip Directorymentioning

confidence: 99%

Cooperative Caching for Chip Multiprocessors

Chang¹,

Herrero²,

Canal³

et al. 2011

Cooperative Networking

View full text Add to dashboard Cite

Chip multiprocessor (CMP) systems have made the on-chip caches a critical resource shared among co-scheduled threads. Limited off-chip bandwidth, increasing on-chip wire delay, destructive inter-thread interference, and diverse workload characteristics pose key design challenges. To address these challenge, we propose CMP cooperative caching (CC), a unified framework to efficiently organize and manage on-chip cache resources. By forming a globally managed, shared cache using cooperative private caches. CC can effectively support two important caching applications: (1) reduction of average memory access latency and (2) isolation of destructive inter-thread interference.CC reduces the average memory access latency by balancing between cache latency and capacity optimizations. Based private caches, CC naturally exploits their access latency benefits. To improve the effective cache capacity, CC forms a "shared" cache using replication control and LRU-based global replacement policies. Via cooperation throttling, CC provides a spectrum of caching behaviors between the two extremes of private and shared caches, thus enabling dynamic adaptation to suit workload requirements. We show that CC can achieve a robust performance advantage over private and shared cache schemes across different processor, cache and memory configurations, and a wide selection of multithreaded and multiprogrammed workloads.To isolate inter-thread caching interference, we add a time-sharing aspect on top of spatial cache partitioning. Our approach uses Multiple Time-sharing Partitions (MTP) to simultaneously improve throughput and fairness while maintaining QoS over the longer term. Each MTP partition unfairly improves at least one thread's throughput, and partitions favoring different threads are scheduled in a cooperative, timesharing manner to either maintain fairness and QoS, or implement priority. We also integrate MTP with CC's LRU-based capacity sharing policy to combine their benefits. The integrated scheme-Cooperative Caching Partitioning (CCP)-divides the total execution epochs into those controlled by either MTP or the ii baseline CC policy, respectively, according to the fraction of threads that can benefit from each of them. Our simulation results show that for a wide range of multiprogrammed workloads, CCP can improve throughput, fairness and QoS for workloads suffering from destructive interference, while achieving the performance benefit of the baseline CC policy for other workloads.iii

show abstract

“…For example, in [24] the integration of directory caches inside the coherence controllers was proposed to minimize directory access time. The Everest architecture proposed in [25] uses directory caches to reduce directory access time. In addition, remote data caches (RDCs) have also been used in several designs (as [17,18]) to accelerate the access to remote data.…”

Section: Reducing Cache Miss Latenciesmentioning

confidence: 99%