SC14: International Conference for High Performance Computing, Networking, Storage and Analysis 2014
DOI: 10.1109/sc.2014.19
|View full text |Cite
|
Sign up to set email alerts
|

Dissecting On-Node Memory Access Performance: A Semantic Approach

Abstract: Abstract-Optimizing memory access is critical for performance and power efficiency. CPU manufacturers have developed sampling-based performance measurement units (PMUs) that report precise costs of memory accesses at specific addresses. However, this data is too low-level to be meaningfully interpreted and contains an excessive amount of irrelevant or uninteresting information.We have developed a method to gather fine-grained memory access performance data for specific data objects and regions of code with low… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
20
0

Year Published

2015
2015
2019
2019

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 21 publications
(20 citation statements)
references
References 22 publications
0
20
0
Order By: Relevance
“…5 Analyzing tasks concurrency gives a room for thread placement Here, we will illustrate a situation where a visual hint can be useful to choose thread placement. In this example the monitoring mode is the per node one: we monitor all the processes of the node.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…5 Analyzing tasks concurrency gives a room for thread placement Here, we will illustrate a situation where a visual hint can be useful to choose thread placement. In this example the monitoring mode is the per node one: we monitor all the processes of the node.…”
Section: Methodsmentioning
confidence: 99%
“…Former approaches were often less portable or do not expose as many details about cache sharing etc. MemAxes [5] oers ne-grained memory performance analysis with a graphical radial hierarchy display. However, it only focuses on static post-mortem analysis of memory accesses while our approach is dynamic and works for all performance metrics and more kinds of resource sharing.…”
Section: Introductionmentioning
confidence: 99%
“…7 As reported by each process by the Virtual Memory High-Water Mark (VmHWM) in /proc/self/status before the process termination. 8 The overhead is calculated using the reported FOM.…”
Section: A System Setupmentioning
confidence: 99%
“…Precise-Event Based Sampling (PEBS) is the implementation of such a feature in recent Intel processors [6]. Performance analysis tools such as HPCToolkit [7], MemAxes [8], Extrae [9], and Intel R Vtune TM Amplifier [10] use a hybrid approach combining instrumentation to track data allocation and PEBS to monitor the application data references. This approach enables exploring in-production executions with a reduced overhead at the cost of providing statistical approximations, even though approximations for long runs resemble the actual results.…”
Section: Introductionmentioning
confidence: 99%
“…The MemAxes tool described in [2] provides visualizations of NUMA data in semantic contexts, including node hardware topology, source code, and application-specific. These visualizations assist the programmer in finding and fixing NUMA performance bottlenecks.The visualization part of the MemAxes tool has been released for general use, but not the measurement part, because the authors have written their own kernel module to access the low-level NUMA performance data.…”
Section: Related Workmentioning
confidence: 99%