2014 IEEE International Symposium on Workload Characterization (IISWC) 2014
DOI: 10.1109/iiswc.2014.6983034
|View full text |Cite
|
Sign up to set email alerts
|

Performance analysis of the memory management unit under scale-out workloads

Abstract: Much attention has been given to the efficient execution of the scale-out applications that dominate in datacenter computing. However, the effects of the hardware support in the Memory Management Unit (MMU) in combination with the distinct characteristics of the scale-out applications have been largely ignored until recently. In this paper, we comprehensively quantify the MMU overhead on a real machine leveraging the use of performance counters on a collection of emerging scale-out applications. We show that t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
40
0

Year Published

2015
2015
2020
2020

Publication Types

Select...
6
1
1

Relationship

3
5

Authors

Journals

citations
Cited by 36 publications
(40 citation statements)
references
References 43 publications
0
40
0
Order By: Relevance
“…Previous work shows that limited TLB reach results in costly pagewalks that degrade application performance, often substan-tially [10,13,14,23,29,31]. Section 2 described the qualitative differences between RMM and the most closely related work on multipage mappings (sub-blocked TLBs [47], CoLT [39], Clustered TLBs [38]), huge pages [1,6,36], and direct segments [10,23], and Section 8 showed quantitatively that RMM substantially improves over them.…”
Section: Related Workmentioning
confidence: 99%
“…Previous work shows that limited TLB reach results in costly pagewalks that degrade application performance, often substan-tially [10,13,14,23,29,31]. Section 2 described the qualitative differences between RMM and the most closely related work on multipage mappings (sub-blocked TLBs [47], CoLT [39], Clustered TLBs [38]), huge pages [1,6,36], and direct segments [10,23], and Section 8 showed quantitatively that RMM substantially improves over them.…”
Section: Related Workmentioning
confidence: 99%
“…This overhead comes from the increased latency of the write operation in the NEMsCAM cell. However, the write operation: (i) takes place only after TLB misses which occur rarely compared to TLB hits, and (ii) adds latency to an already slow operation, i.e., L2-TLB access (∼7 cycles [17]) including potentially the penalty of L2-TLB miss (several tens of cycles [20]). Consequently, the NEMsCAM TLBs have negligible impact on the execution time for most workloads (0.32% on average) while reducing significantly the energy spent on the TLB hierarchy.…”
Section: B Resultsmentioning
confidence: 99%
“…In case of a hit, the TLB returns the physical address, and the memory operation proceeds. In case of a miss, the operation stalls until the translation is retrieved from the memory which might take tens of cycles [20].…”
Section: Introductionmentioning
confidence: 99%
“…In case of a TLB miss, a hardware state machine walks the page table, a process named page walk, and fetches the corresponding page table entry from memory. Thus, the TLB is the most crucial component for accelerating virtual memory, and its miss ratio significantly affects the performance of the processor [13,15,30,36].…”
Section: Address Translation Hardware Supportmentioning
confidence: 99%
“…Only the recent work on TLB Pred [41] considers huge pages for improving the dynamic energy efficiency in TLBs. The performance of TLB Pred depends on huge pages successfully reducing misses, but prior work shows that huge pages can still incur high performance overheads due to TLB misses [13,15,36]. In response, researchers proposed techniques that further increase the TLB reach [13,22,35,42,43,50] to overcome the limitations of huge pages.…”
Section: Introductionmentioning
confidence: 97%