Communication-Based Mapping Using Shared Pages

Diener, Matthias; Cruz, Eduardo H. M.; Navaux, Philippe O. A.

doi:10.1109/ipdps.2013.57

Cited by 23 publications

(17 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our previous work [21] also uses page faults to determine the communication behavior of parallel applications. This work is limited to multithreaded applications, that is, parallel applications that share a single page table and does not support MPI applications.…”

Section: Mapping Of Multithreaded Applicationsmentioning

confidence: 99%

See 1 more Smart Citation

Communication-aware process and thread mapping using online communication detection

et al. 2015

Self Cite

View full text Add to dashboard Cite

Section: Mapping Of Multithreaded Applicationsmentioning

confidence: 99%

“…CDSM represents an extension of our previous work [21], which was limited to multi-threaded applications whose communication behavior does not change during execution. We extended it with support for multi-process parallel applications, such as applications based on MPI, and added techniques to detect dynamic behavior during execution.…”

Section: Introductionmentioning

confidence: 99%

Communication-aware process and thread mapping using online communication detection

et al. 2015

Self Cite

View full text Add to dashboard Cite

“…The proposal has been implemented as a user-level scheduler for the Linux operating system. Diener et al [12] have presented a mechanism to dynamically map threads to machine cores. This technique detects the communication pattern of the application by monitoring a table that logs page accesses.…”

Section: Related Workmentioning

confidence: 99%

Compiler support for selective page migration in NUMA architectures

Piccoli

Santos²,

Rodrigues³

et al. 2014

Proceedings of the 23rd International Conference on Parallel Architectures and Compilation

View full text Add to dashboard Cite

Current high-performance multicore processors provide users with a non-uniform memory access model (NUMA). These systems perform better when threads access data on memory banks next to the core where they run. However, ensuring data locality is di cult. In this paper, we propose compiler analyses and code generation methods to support a lightweight runtime system that dynamically migrates memory pages to improve data locality. Our technique combines static and dynamic analyses and is capable of identifying the most promising pages to migrate. Statically, we infer the size of arrays, plus the amount of reuse of each memory access instruction in a program. These estimates rely on a simple, yet accurate, trip count predictor of our own design. This knowledge let's us build templates of dynamic checks, to be filled with values known only at runtime. These checks determine when it is profitable to migrate data closer to the processors where this data is used. Our static analyses are quadratic on the number of variables in a program, and the dynamic checks are O(1) in practice. Our technique does not require any form of user intervention, neither the support of a third-party middleware, nor modifications in the operating system's kernel. We have applied our technique on several parallel algorithms, which are completely oblivious to the asymmetric memory topology, and have observed speedups of up to 4x, compared to static heuristics. We compare our approach against Minas, a middleware that supports NUMA-aware data allocation, and show that we can outperform it by up to 50% in some cases.

show abstract

“…Previous research [11,13] has shown that memory access information can be gathered by analyzing the page faults of parallel applications. We adapt the idea of these mechanisms for kMAF to determine the memory access behavior.…”

Section: Determine Memory Access Behaviormentioning

confidence: 99%

“…Most approaches focus on either thread affinity [2,13,14] or data affinity [1,12,28,34], but perform them only separately. Some mechanisms rely on execution traces [14,28], which cause a high overhead [3] and can not be used if the behavior of the application changes between executions.…”

Section: Introductionmentioning

confidence: 99%

kMAF

Diener

Cruz

Navaux

et al. 2014

Proceedings of the 23rd International Conference on Parallel Architectures and Compilation

Self Cite

View full text Add to dashboard Cite

One of the main challenges for parallel architectures is the increasing complexity of the memory hierarchy, which consists of several levels of private and shared caches, as well as interconnections between separate memories in NUMA machines. To make full use of this hierarchy, it is necessary to improve the locality of memory accesses by reducing accesses to remote caches and memories, and using local ones instead. Two techniques can be used to increase the memory access locality: executing threads and processes that access shared data close to each other in the memory hierarchy (thread affinity), and placing the memory pages they access on the NUMA node they are executing on (data affinity). Most related work in this area focuses on either thread or data affinity, but not both, which limits the improvements. Other mechanisms require expensive operations, such as memory access traces or binary analysis, require changes to hardware or work only on specific parallel APIs.In this paper, we introduce kMAF, a mechanism that automatically manages thread and data affinity on the kernel level. The memory access behavior of the running application is determined during its execution by analyzing its page faults. This information is used by kMAF to migrate threads and memory pages, such that the overall memory access locality is optimized. Extensive evaluation with 27 benchmarks from 4 benchmark suites shows substantial performance improvements, with results close to an oracle mechanism. Execution time was reduced by up to 35.7% (13.8% on average), while energy efficiency was improved by up to 34.6% (9.3% on average).

show abstract

Communication-Based Mapping Using Shared Pages

Cited by 23 publications

References 16 publications

Communication-aware process and thread mapping using online communication detection

Communication-aware process and thread mapping using online communication detection

Compiler support for selective page migration in NUMA architectures

kMAF

Contact Info

Product

Resources

About