Dissecting On-Node Memory Access Performance: A Semantic Approach

Giménez, Alfredo; Gamblin, Todd; Rountree, Barry; Bhatelé, Abhinav; Jusufi, Ilir; Bremer, Peer-Timo; Hamann, Bernd

doi:10.1109/sc.2014.19

Cited by 21 publications

(20 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…5 Analyzing tasks concurrency gives a room for thread placement Here, we will illustrate a situation where a visual hint can be useful to choose thread placement. In this example the monitoring mode is the per node one: we monitor all the processes of the node.…”

Section: Methodsmentioning

confidence: 99%

“…Former approaches were often less portable or do not expose as many details about cache sharing etc. MemAxes [5] oers ne-grained memory performance analysis with a graphical radial hierarchy display. However, it only focuses on static post-mortem analysis of memory accesses while our approach is dynamic and works for all performance metrics and more kinds of resource sharing.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A Topology-Aware Performance Monitoring Tool for Shared Resource Management in Multicore Systems

Denoyelle

Goglin

Jeannot

2015

Euro-Par 2015: Parallel Processing Workshops

View full text Add to dashboard Cite

Section: Methodsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

A Topology-Aware Performance Monitoring Tool for Shared Resource Management in Multicore Systems

Denoyelle

Goglin

Jeannot

2015

Euro-Par 2015: Parallel Processing Workshops

View full text Add to dashboard Cite

“…7 As reported by each process by the Virtual Memory High-Water Mark (VmHWM) in /proc/self/status before the process termination. 8 The overhead is calculated using the reported FOM.…”

Section: A System Setupmentioning

confidence: 99%

“…Precise-Event Based Sampling (PEBS) is the implementation of such a feature in recent Intel processors [6]. Performance analysis tools such as HPCToolkit [7], MemAxes [8], Extrae [9], and Intel R Vtune TM Amplifier [10] use a hybrid approach combining instrumentation to track data allocation and PEBS to monitor the application data references. This approach enables exploring in-production executions with a reduced overhead at the cost of providing statistical approximations, even though approximations for long runs resemble the actual results.…”

Section: Introductionmentioning

confidence: 99%

Automating the Application Data Placement in Hybrid Memory Systems

Servat

Peña

Llort

et al. 2017

2017 IEEE International Conference on Cluster Computing (CLUSTER)

View full text Add to dashboard Cite

Abstract-Multi-tiered memory systems, such as those based on Intel R Xeon Phi TM processors, are equipped with several memory tiers with different characteristics including, among others, capacity, access latency, bandwidth, energy consumption, and volatility. The proper distribution of the application data objects into the available memory layers is key to shorten the timeto-solution, but the way developers and end-users determine the most appropriate memory tier to place the application data objects has not been properly addressed to date.In this paper we present a novel methodology to build an extensible framework to automatically identify and place the application's most relevant memory objects into the Intel Xeon Phi fast on-package memory. Our proposal works on top of inproduction binaries by first exploring the application behavior and then substituting the dynamic memory allocations. This makes this proposal valuable even for end-users who do not have the possibility of modifying the application source code. We demonstrate the value of a framework based in our methodology for several relevant HPC applications using different allocation strategies to help end-users improve performance with minimal intervention. The results of our evaluation reveal that our proposal is able to identify the key objects to be promoted into fast on-package memory in order to optimize performance, leading to even surpassing hardware-based solutions.

show abstract

“…The MemAxes tool described in [2] provides visualizations of NUMA data in semantic contexts, including node hardware topology, source code, and application-specific. These visualizations assist the programmer in finding and fixing NUMA performance bottlenecks.The visualization part of the MemAxes tool has been released for general use, but not the measurement part, because the authors have written their own kernel module to access the low-level NUMA performance data.…”

Section: Related Workmentioning

confidence: 99%

A prototype sampling interface for PAPI

Lopez

Moore

Weaver

2015

Proceedings of the 2015 XSEDE Conference on Scientific Advancements Enabled by Enhanced Cyberinfrastructure - XSEDE '15

View full text Add to dashboard Cite

PAPI is a widely used portable library for accessing hardware counters on modern microprocessors. PAPI offers both counting and sampling interfaces, but the sampling interface is extremely limited, consisting of a simple interrupt-driven interface that can periodically report processor state. In the past few years, the hardware and operating systems of modern processors have added support for new more advanced sampling features. These features enable information about non-uniform memory access (NUMA) behavior to be obtained. Currently, performance tool developers who want to provide sampling data to their users must make use of a complex low-level kernel interface, sometimes developing their own kernel patch to access the features they need. This paper reports on initial efforts to develop a middleware layer that will serve as a stable interface and enable tool developers to access sampling data through standard PAPI calls and to obtain data important for NUMA analysis.

show abstract

Dissecting On-Node Memory Access Performance: A Semantic Approach

Cited by 21 publications

References 22 publications

A Topology-Aware Performance Monitoring Tool for Shared Resource Management in Multicore Systems

A Topology-Aware Performance Monitoring Tool for Shared Resource Management in Multicore Systems

Automating the Application Data Placement in Hybrid Memory Systems

A prototype sampling interface for PAPI

Contact Info

Product

Resources

About