Memory Hierarchy for Web Search

Ayers, Grant; Ahn, Jung Ho; Kozyrakis, Christos; Ranganathan, Parthasarathy

doi:10.1109/hpca.2018.00061

Cited by 60 publications

(38 citation statements)

References 43 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Evaluating an architectural feature in a real system is not an easy task. Multiple factors challenge this: general-purpose CPUs run a large spectrum of workloads with mixed bottlenecks [2,4]. For example, profiling results shared by this article, indicate that integer benchmarks highly benefit from better instruction fetch, while Floating-Point (FP) benchmarks benefit from an optimized execution engine.…”

Section: Difficultymentioning

confidence: 95%

A Metric-Guided Method for Discovering Impactful Features and Architectural Insights for Skylake-Based Processors

Yasin

Haj-Yahya

Ben-Asher

et al. 2019

ACM Trans. Archit. Code Optim.

View full text Add to dashboard Cite

The slowdown in technology scaling puts architectural features at the forefront of the innovation in modern processors. This article presents a Metric-Guided Method (MGM) that extends Top-Down analysis with carefully selected, dynamically adapted metrics in a structured approach. Using MGM, we conduct two evaluations at the microarchitecture and the Instruction Set Architecture (ISA) levels. Our results show that simple optimizations, such as improved representation of CISC instructions, broadly improve performance, while changes in the Floating-Point execution units had mixed impact. Overall, we report 10 architectural insights-at the microarchitecture, ISA, and compiler fronts-while quantifying their impact on the SPEC CPU benchmarks.

show abstract

Section: Difficultymentioning

confidence: 95%

A Metric-Guided Method for Discovering Impactful Features and Architectural Insights for Skylake-Based Processors

Yasin

Haj-Yahya

Ben-Asher

et al. 2019

ACM Trans. Archit. Code Optim.

View full text Add to dashboard Cite

show abstract

“…For each workload, we configure the overall memory capacity (i.e., second tier of the hierarchy) to be equal to the workload's dataset size (i.e., Data Serving, Web Search and Media Streaming have 16GB datasets, while Data Analytics and Web Serving have 32GB datasets). However, today's datacenter-scale applications can have much larger datasets that even span into the terabyte range [9,53]; since our work makes specific claims about the capacity ratio relating the two tiers of our memory hierarchy, we conducted a study to verify that our results stand for larger datasets.…”

Section: Evaluation Methodologymentioning

confidence: 99%

Design guidelines for high-performance SCM hierarchies

Ustiugov¹,

Daglis²,

Picorel

et al. 2018

Proceedings of the International Symposium on Memory Systems

View full text Add to dashboard Cite

With emerging storage-class memory (SCM) nearing commercialization, there is evidence that it will deliver the much-anticipated high density and access latencies within only a few factors of DRAM. Nevertheless, the latency-sensitive nature of memory-resident services makes seamless integration of SCM in servers questionable. In this paper, we ask the question of how best to introduce SCM for such servers to improve overall performance/cost over existing DRAM-only architectures. We first show that even with the most optimistic latency projections for SCM, the higher memory access latency results in prohibitive performance degradation. However, we find that deployment of a modestly sized high-bandwidth 3D stacked DRAM cache makes the performance of an SCM-mostly memory system competitive. The high degree of spatial locality that memory-resident services exhibit not only simplifies the DRAM cache's design as page-based, but also enables the amortization of increased SCM access latencies and the mitigation of SCM's read/write latency disparity.We identify the set of memory hierarchy design parameters that plays a key role in the performance and cost of a memory system combining an SCM technology and a 3D stacked DRAM cache. We then introduce a methodology to drive provisioning for each of these design parameters under a target performance/cost goal. Finally, we use our methodology to derive concrete results for specific SCM technologies. With PCM as a case study, we show that a two bits/cell technology hits the performance/cost sweet spot, reducing the memory subsystem cost by 40% while keeping performance within 3% of the best performing DRAM-only system, whereas single-level and triple-level cell organizations are impractical for use as memory replacements.• Information systems → Storage class memory; Cloud based storage; • Hardware → Memory and dense storage; KEYWORDS Storage-class memory, heterogeneous memory hierarchy, 3D stacked DRAM ACM Reference Format:

show abstract

“…Huang et al [11] use SRAM banks for storing tags, reducing tag lookup latency. The other prior works in [8], [9] propose an on-chip L4 DRAM cache based on eDRAM and 3D-stacked DRAM, respectively. By placing the L4 DRAM cache near processors or deploying private L4 cache for each core, we can benefit from the lower latency than the off-chip DRAM.…”

Section: B L4 Dram Cache Architecturementioning

confidence: 99%

“…Such DRAM cache can provide low latency and/or high bandwidth compared with the main memory, and much larger capacity than the SRAM cache according to the type of DRAM and its implementation [7]. To maximize its benefits regarding performance, most of prior work propose various techniques for the DRAM cache [8]- [12].…”

Section: Introductionmentioning

confidence: 99%

Defending Against Flush+Reload Attack With DRAM Cache by Bypassing Shared SRAM Cache

et al. 2020

View full text Add to dashboard Cite

Cache side-channel attack is one of the critical security threats to modern computing systems. As a representative cache side-channel attack, Flush+Reload attack allows an attacker to steal confidential information (e.g., private encryption key) by monitoring a victim's cache access patterns while generating the confidential values. Meanwhile, for providing high performance with memory-intensive applications that do not fit in the on-chip SRAM-based last-level cache (e.g., L3 cache), modern computing systems start to deploy DRAM cache between the SRAM-based last-level cache and the main memory DRAM, which can provide low latency and/or high bandwidth. However, in this work, we propose an approach that exploits the DRAM cache for security rather than performance, called ByCA. ByCA bypasses the L3 shared cache when accessing cache blocks suspected as target blocks of an attacker. Consequently, ByCA eliminates the timing difference when the attacker accesses the target cache blocks, nullifying the Flush+Reload attacks. To this end, ByCA keeps cache blocks suspected as target blocks of the attacker and stores their states (i.e., flushed by clflush or not) in the L4 DRAM cache even with clflush instruction; ByCA redefines and re-implements clflush instruction not to flush cache blocks from the L4 DRAM cache while flushing the blocks from other level caches (i.e., L1, L2, and L3 caches). In addition, ByCA bypasses L3 cache when the attacker or the victim accesses the target blocks flushed by clflush, making the attacker always obtain the blocks from L4 DRAM cache regardless of the victim's access patterns. Consequently, ByCA eliminates the timing difference, thus the attacker cannot monitor the victim's cache access patterns. For L4 DRAM cache, we implement Alloy Cache design and use an unused bit in a tag entry for each block to store its state. ByCA only requires a single bit extension to cache blocks in L1 and L2 private caches, and a tag entry for each block in the L4 DRAM cache. Our experimental results show that ByCA completely eliminates the timing differences when the attacker reloads the target blocks. Furthermore, ByCA does not show the performance degradation for the victim while co-running with the attacker that flushes and reloads target blocks temporally and repetitively.

show abstract

Memory Hierarchy for Web Search

Cited by 60 publications

References 43 publications

A Metric-Guided Method for Discovering Impactful Features and Architectural Insights for Skylake-Based Processors

A Metric-Guided Method for Discovering Impactful Features and Architectural Insights for Skylake-Based Processors

Design guidelines for high-performance SCM hierarchies

Defending Against Flush+Reload Attack With DRAM Cache by Bypassing Shared SRAM Cache

Contact Info

Product

Resources

About