Enabling Partial‐Cache Line Prefetching through Data Compression

Zhang, Youtao; Gupta, Rajiv

doi:10.1002/0471732710.ch9

Cited by 8 publications

(8 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Zhang et al [61] propose a technique which uses cache compression to reduce memory bandwidth consumption and further utilizes the freed bandwidth to prefetch additional compressible values. With each cache line in memory, their technique associates another line which acts as the prefetch candidate.…”

Section: Interaction With Other Techniquesmentioning

confidence: 99%

A Survey Of Architectural Approaches for Data Compression in Cache and Main Memory Systems

Mittal

Vetter

2016

IEEE Trans. Parallel Distrib. Syst.

View full text Add to dashboard Cite

As the number of cores on a chip increase and key applications become even more data-intensive, memory systems in modern processors have to deal with increasingly large amount of data. In face of such challenges, data compression presents as a promising approach to increase effective memory system capacity and also provide performance and energy advantages. This paper presents a survey of techniques for using compression in cache and main memory systems. It also classifies the techniques based on key parameters to highlight their similarities and differences. It discusses compression in CPUs and GPUs, conventional and non-volatile memory (NVM) systems, and 2D and 3D memory systems. We hope that this survey will help the researchers in gaining insight into the potential role of compression approach in memory components of future extreme-scale systems.

show abstract

Section: Interaction With Other Techniquesmentioning

confidence: 99%

A Survey Of Architectural Approaches for Data Compression in Cache and Main Memory Systems

Mittal

Vetter

2016

IEEE Trans. Parallel Distrib. Syst.

View full text Add to dashboard Cite

show abstract

“…There has been much research done on using data compression to expand the effective cache or memory capacity [2,3,48,10,51,47,41,50,20,46]. IBM has commercialized the first compressed memory subsystem MXT [1,41] which uses specialized hardware to compress and decompress the entire memory.…”

Section: Related Workmentioning

confidence: 99%

Miss-Correlation Folding: Encoding Per-Block Miss Correlations in Compressed DRAM for Data Prefetching

Liu

Peir

Lee

2012

2012 IEEE 26th International Parallel and Distributed Processing Symposium

View full text Add to dashboard Cite

Cache misses frequently exhibit repeated streaming behavior, i.e. a sequence of cache misses has a high tendency of being repeated. Correlation-based prefetchers record the missing streams in a history table for accurate prefetching. Saving a large miss history in off-chip DRAM is a practical implementation, but incurs access latency and consumes memory bandwidth which leads to performance degradation.In this paper, we investigate a new data prefetching mechanism based on per-block miss correlation where a miss is correlated with an earlier miss when the two misses are closely encountered both in time and space. The miss correlations are captured dynamically and saved along with the content of the data block using a simple data compression technique. As a result of this novel combination, our scheme provides unbounded correlation history and its prefetch metadata can be fetched together with demand data without incurring additional latency nor consuming any memory bandwidth. Performance evaluations using data-parallel applications demonstrate that prefetchers based on per-block miss correlations can improve IPC by 42-139% with an average of 88% compared to the IPC without prefetching. In comparison with regular stream prefetcher, sampled temporal streaming prefetcher and spatial-temporal memory streaming prefetcher, up to 115%, 99% and 98% IPC improvement can be obtained with an average about 36%, 26% and 27% respectively.

show abstract

“…Zhang and Gupta [46] exploit their compressed cache design [47] to prefetch partial compressed lines from the next level in the memory hierar-chy. Lee, et al.…”

Section: Related Workmentioning

confidence: 99%

Interactions Between Compression and Prefetching in Chip Multiprocessors

Alameldeen

Wood

2007

2007 IEEE 13th International Symposium on High Performance Computer Architecture

View full text Add to dashboard Cite

In chip multiprocessors (CMPs), multiple cores compete for shared resources such as on-chip caches and off-chip pin bandwidth. Stride-based hardware prefetching increases demand for these resources, causing contention that can degrade performance (up to 35% for one of our benchmarks).In this paper, we first show that cache and link (off-chip interconnect) compression can increase the effective cache capacity (thereby reducing off-chip misses) and increase the effective off-chip bandwidth (reducing contention). On an 8-processor CMP with no prefetching, compression improves performance by up to 18%for commercial workloads. Second, we propose a simple adaptive prefetching mechanism that uses cache compressions extra tags to detect useless and harmful prefetches. Furthermore, in the central result of this paper, we show that compression and prefetching interact in a strong positive way, resulting in combined performance improvement of 10-51% for seven of our eight workloads.

show abstract

Enabling Partial‐Cache Line Prefetching through Data Compression

Cited by 8 publications

References 10 publications

A Survey Of Architectural Approaches for Data Compression in Cache and Main Memory Systems

A Survey Of Architectural Approaches for Data Compression in Cache and Main Memory Systems

Miss-Correlation Folding: Encoding Per-Block Miss Correlations in Compressed DRAM for Data Prefetching

Interactions Between Compression and Prefetching in Chip Multiprocessors

Contact Info

Product

Resources

About