How to citeComplete issue More information about this article Journal's homepage in redalyc.org Scientific Information System Network of Scientific Journals from Latin America, the Caribbean, Spain and Portugal Non-profit academic project, developed under the open access initiative
As more processing cores are integrated into one chip and feature size continues to shrink, the average access latency for remote nodes using directory-based coherence protocol becomes higher, which greatly impacts system performance. Previous techniques such as data replication and data migration optimize the performance of the requesting core, but offer little improvement for neighbor nodes. Other techniques such as in-transit optimization try to reduce latency at the cost of increased storage. This paper introduces hierarchical cache directory into CMP (chip multiprocessor), which divides CMP tiles into multiple regions hierarchically, and combines it with data replication. A new directory organization is proposed to record the share status within a region and assist the regional home to complete operation efficiently. Simulation results show that for a 16-core CMP, compared to traditional directory, hierarchical cache directory reduces average access latency by 9% and on-chip network traffic by 34% on average with less storage. Theoretical analyses show that for a 2 n × 2 n tiled CMP, the average access latency in hierarchical cache directory asymptotically approaches a function that is independent of n, hence the architecture is highly scalable.
Abstract. Scaling DRAM will be increasingly difficult due to power and cost constraint. Phase Change Memory (PCM) is an emerging memory technology that can increase main memory capacity in a cost-effective and power-efficient manner. However, PCM incurs relatively long latency , high write energy, and finite endurance. To make PCM an alternative for scalable main memory, write traffic to PCM should be reduced, where memory replacement policy could play a vital role.In this paper, we propose a Read-Write Aware policy (RWA) to reduce write traffic without performance degradation. RWA explores the asymmetry of read and write costs of PCM, and prevents dirty data lines from frequent evictions. Simulations results on an 8-core CMP show that for memory organization with and without DRAM buffer, RWA can achieve 33.1% and 14.2% reduction in write traffic to PCM respectively. In addition, an Improved RWA (I-RWA) is proposed that takes into consideration the write access pattern and can further improve memory efficiency. For organization with DRAM buffer, I-RWA provides a significant 42.8% reduction in write traffic. Furthermore, both RWA and I-RWA incurs no hardware overhead and can be easily integrated into existing hardware.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations –citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.