Adaptive line placement with the
            <i>set balancing cache</i>

Rolan, Dyer; Fraguela, Basilio B.; Doallo, Ramón

doi:10.1145/1669112.1669178

Cited by 51 publications

(40 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To improve access latency, a hit in a secondary location causes the primary and secondary locations to be swapped. This scheme has been extended with better ways to predict which location to probe first [10], higher associativities [45], and schemes that explicitly identify the less used sets and use them to store the more used ones [37]. The drawbacks of allowing multiple locations per way are the variable hit latency and reduced cache bandwidth due to multiple lookups, and the additional energy required to do swaps on hits.…”

Section: B Approaches That Increase the Number Of Locationsmentioning

confidence: 99%

“…Most alternative approaches to improve associativity rely on increasing the number of locations where a block can be placed (with e.g. multiple locations per way [1,10,37], victim caches [3,25] or extra levels of indirection [18,36]). Increasing the number of possible locations of a block ultimately increases the energy and latency of cache hits, and many of these schemes are more complex than conventional cache arrays (requiring e.g.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

The ZCache: Decoupling Ways and Associativity

Sánchez

Kozyrakis

2010

2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture

166

110

View full text Add to dashboard Cite

Abstract-The ever-increasing importance of main memory latency and bandwidth is pushing CMPs towards caches with higher capacity and associativity. Associativity is typically improved by increasing the number of ways. This reduces conflict misses, but increases hit latency and energy, placing a stringent trade-off on cache design. We present the zcache, a cache design that allows much higher associativity than the number of physical ways (e.g. a 64-associative cache with 4 ways). The zcache draws on previous research on skew-associative caches and cuckoo hashing. Hits, the common case, require a single lookup, incurring the latency and energy costs of a cache with a very low number of ways. On a miss, additional tag lookups happen off the critical path, yielding an arbitrarily large number of replacement candidates for the incoming block.Unlike conventional designs, the zcache provides associativity by increasing the number of replacement candidates, but not the number of cache ways. To understand the implications of this approach, we develop a general analysis framework that allows to compare associativity across different cache designs (e.g. a set-associative cache and a zcache) by representing associativity as a probability distribution. We use this framework to show that for zcaches, associativity depends only on the number of replacement candidates, and is independent of other factors (such as the number of cache ways or the workload). We also show that, for the same number of replacement candidates, the associativity of a zcache is superior than that of a set-associative cache for most workloads. Finally, we perform detailed simulations of multithreaded and multiprogrammed workloads on a large-scale CMP with zcache as the last-level cache. We show that zcaches provide higher performance and better energy efficiency than conventional caches without incurring the overheads of designs with a large number of ways.

show abstract

Section: B Approaches That Increase the Number Of Locationsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

The ZCache: Decoupling Ways and Associativity

Sánchez

Kozyrakis

2010

2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture

166

110

View full text Add to dashboard Cite

show abstract

“…However, due to its limited capacity, it is not particularly useful when the number of sets with large local misses are considerably large [8]. Inspired by this scheme, Scavenger [3] [30], Scavenger [3], and SBC [32] caches in terms of the percentage reduction in cache misses relative to the SLC cache in previous configuration. Note that cache sizes are set to have same die area.…”

Section: Reducing Conflict Misses In Cachesmentioning

confidence: 99%

“…We use three schemes for comparison, including the V-Way cache [30], the Scavenger cache [3], and the dynamic SBC cache [32]. For fair analysis, these three approaches are applied to SLC with the same die size.…”

Section: Comparative Analysismentioning

confidence: 99%

A Study on Performance and Power Efficiency of Dense Non-Volatile Caches in Multi-Core Systems

Jadidi

Arjomand

Kandemir

et al. 2017

Proceedings of the 2017 ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems

View full text Add to dashboard Cite

In this paper, we present a novel cache design based on Mult-Level Cell Spin-Transfer Torque RAM (MLC STT-RAM) that can dynamically adapt the set capacity and associativity to use efficiently the full potential of MLC STT-RAM. We exploit the asymmetric nature of the MLC storage scheme to build cache lines featuring heterogeneous performances, that is, half of the cache line are read-friendly, while the other is write-friendly. Furthermore, we propose to opportunistically deactivate ways in underutilized sets to convert MLC to Single-Level Cell (SLC) mode, which features overall better performance and lifetime. Our ultimate goal is to build a cache architecture that combines the capacity advantages of MLC and performance/energy advantages of SLC. Our experiments show an improvement of 43% in total numbers of conflict misses, 27% in memory access latency, 12% in system performance (i.e., IPC), and 26% in L3 access energy, with a slight degradation in cache lifetime (about 7%) compared to an SLC cache.

show abstract

“…A thorough understanding of program behaviors is a prerequisite to efficient architecture designs and optimizations [1], [2], [3], [4]. In most cases, program behaviors are studied through detailed simulation.…”

Section: Introductionmentioning

confidence: 99%

Accelerating the Extraction of Representative Behaviors of Programs with Dynamic Binary Translation

Zhao

Jiang

et al. 2011

2011 IEEE International Conference on High Performance Computing and Communications

View full text Add to dashboard Cite

Program behavior analysis is the foundation of computer architecture research. Therefore it is vital to be able to extract the representative behaviors of programs in an efficient manner. Representative behaviors of programs are usually extracted through the SimPoint methodology. However, generating BBV (Basic Block Vector) profiles for SimPoint is usually quite slow. This paper evaluates the effectiveness of accelerating BBV profile generation with dynamic binary translation technique. First, A general framework for BBV profile generation using dynamic binary translation is presented. Then several optimization techniques and accuracy enhancements are proposed. Based on the framework and the optimizations, a highly efficient BBV profile generator, QPoint, is presented. The performance, overhead and accuracy of QPoint is evaluated using the SPEC2006 benchmark set. Experimental results show that the optimization method proposed can improve the performance by up to 147%, on average 56%. The speed of the optimized QPoint is up to 40x, and on average 10.5x compared with a functional simulation based BBV profile generator. The overhead incurred by BBV profile gathering is less than 4% which is the lowest among existing tools. The accuracy of QPoint is also validated against a functional simulation based tool.Compared with existing tools, the proposed QPoint tool has two main advantages. First, the performance of QPoint is tremendous, with a speed of up to 292 MIPS, on average 109 MIPS, on an ordinary PC. Second, QPoint supports most architectures, including x86/x86 64, ARM, POWER, SPARC, MIPS et al., and can be used to generate cross-platform BBV profiles.

show abstract

Adaptive line placement with the set balancing cache

Cited by 51 publications

References 22 publications

The ZCache: Decoupling Ways and Associativity

The ZCache: Decoupling Ways and Associativity

A Study on Performance and Power Efficiency of Dense Non-Volatile Caches in Multi-Core Systems

Accelerating the Extraction of Representative Behaviors of Programs with Dynamic Binary Translation

Contact Info

Product

Resources

About