Unison Cache: A Scalable and Effective Die-Stacked DRAM Cache

Jevdjic, Djordje; Loh, Gabriel H.; Kaynak, Cansu; Falsafi, Babak

doi:10.1109/micro.2014.51

Cited by 150 publications

(122 citation statements)

References 32 publications

Supporting

Mentioning

122

Contrasting

Order By: Relevance

“…To best utilize fast, energy-efficient in-package DRAM without burdening software writers, many researchers propose to use it as a large last-level cache [20,21,22,26,27,29,38]. This is justified by the fact that the in-package DRAM capacity, ranging from hundreds of megabytes to several gigabytes [27], is still not big enough to completely replace the main memory especially for emerging applications with huge memory footprints [16] such as in-memory database [7] and genome assemblies [31].…”

Section: Introductionmentioning

confidence: 99%

“…To alleviate the problems associated with large tags, pagebased DRAM caches have recently been proposed to cache at a page granularity, typically ranging from 1 to 8 kilobytes [20,21,22]. In addition to reducing the tag overhead, page-based caches have additional benefits of higher hit rate by better exploiting spatial locality and maximum DRAM access efficiency by amortizing row activation cost.…”

Section: Introductionmentioning

confidence: 99%

“…However, the tag overhead is still significant. For 1GB in-package DRAM, most of the existing page-based DRAM caches either require multi-megabytes of on-die SRAM (i.e., 2MB for [21,22]) or allocate 32-64MB in-package DRAM [20] just for tags. This leads to significant overhead in terms of latency, chip area, and energy consumption, which is likely to increase as the in-package DRAM size continues to scale up.…”

Section: Introductionmentioning

confidence: 99%

“…Eviction from the victim cache is performed asynchronously to take write-back overhead off from the cache access path by having a small number of free blocks always available. Note that techniques to alleviate the over-fetching problem for page-based caches, such as footprint caching [21], hot/cold page tracking [20,22], are complementary and can augment our work.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

A fully associative, tagless DRAM cache

Lee

Kim

Jang

et al. 2015

Proceedings of the 42nd Annual International Symposium on Computer Architecture

View full text Add to dashboard Cite

This paper introduces a tagless cache architecture for large in-package DRAM caches. The conventional die-stacked DRAM cache has both a TLB and a cache tag array, which are responsible for virtual-to-physical and physical-to-cache address translation, respectively. We propose to align the granularity of caching with OS page size and take a unified approach to address translation and cache tag management. To this end, we introduce cache-map TLB (cTLB), which stores virtual-to-cache, instead of virtual-to-physical, address mappings. At a TLB miss, the TLB miss handler allocates the requested block into the cache if it is not cached yet, and updates both the page table and cTLB with the virtual-tocache address mapping. Assuming the availability of large in-package DRAM caches, this ensures that an access to the memory region within the TLB reach always hits in the cache with low hit latency since a TLB access immediately returns the exact location of the requested block in the cache, hence saving a tag-checking operation. The remaining cache space is used as victim cache for memory pages that are recently evicted from cTLB. By completely eliminating data structures for cache tag management, from either on-die SRAM or inpackage DRAM, the proposed DRAM cache achieves best scalability and hit latency, while maintaining high hit rate of a fully associative cache. Our evaluation with 3D ThroughSilicon Via (TSV)-based in-package DRAM demonstrates that the proposed cache improves the IPC and energy efficiency by 30.9% and 39.5%, respectively, compared to the baseline with no DRAM cache. These numbers translate to 4.3% and 23.8% improvements over an impractical SRAM-tag cache requiring megabytes of on-die SRAM storage, due to low hit latency and zero energy waste for cache tags.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

A fully associative, tagless DRAM cache

Lee

Kim

Jang

et al. 2015

Proceedings of the 42nd Annual International Symposium on Computer Architecture

View full text Add to dashboard Cite

show abstract

“…Our choice was motivated by the multidimensional effects of a 3D architecture, as compared to a 2D design, and by the increased interest around this emerging technology [19], [11] 1 . The 3D stacking technique improves performance, due to the higher bandwidth and lower latency of the on-chip DRAM.…”

Section: Representative Case Studymentioning

confidence: 99%

Toward Multi-Layer Holistic Evaluation of System Designs

Kleanthous

Sazeides

Özer

et al. 2016

IEEE Comput. Arch. Lett.

View full text Add to dashboard Cite

The common practice for quantifying the benefit(s) of design-time architectural choices of server processors is often limited to the chip-or server-level. This quantification process invariably entails the use of salient metrics, such as performance, power, and reliability, which capture -in a tangible manner -a designs overall ramifications. This paper argues for the necessity of a more holistic evaluation approach, which considers metrics across multiple integration levels (chip, server and datacenter). In order to facilitate said comprehensive evaluation, we utilize an aggregate metric, e.g. the Total Cost of Ownership (TCO), to harness the complexly of comparing multiple metrics at multiple levels. We motivate our proposition for holistic evaluation with a case study that compares a 2D processor to a 3D processor at various design integration levels. We show that while a 2D processor is clearly the best choice at the processor level, the conclusion is reversed at the data-center level, where the 3D processor becomes a better choice. This result emanates mainly from the performance benefits of processor-DRAM 3D integration, and the ability to amortize (at the datacenter-level) the higher 3D per-server cost and lower reliability by requiring fewer 3D servers to match the same performance.

show abstract

A Fast Joint Application-Architecture Exploration Platform for Heterogeneous Systems

Maeda

Yang

et al. 2019

Embedded, Cyber-Physical, and IoT Systems

View full text Add to dashboard Cite

Unison Cache: A Scalable and Effective Die-Stacked DRAM Cache

Cited by 150 publications

References 32 publications

A fully associative, tagless DRAM cache

A fully associative, tagless DRAM cache

Toward Multi-Layer Holistic Evaluation of System Designs

A Fast Joint Application-Architecture Exploration Platform for Heterogeneous Systems

Contact Info

Product

Resources

About