Shared memory architectures have recently experienced a large increase in thread-level parallelism, leading to complex memory hierarchies with multiple cache memory levels and memory controllers. These new designs created a Non-Uniform Memory Access (NUMA) behavior, where the performance and energy consumption of memory accesses depend on the place where the data is located in the memory hierarchy. Accesses to local caches or memory controllers are generally more efficient than accesses to remote ones. A common way to improve the locality and balance of memory accesses is to determine the mapping of threads to cores and data to memory controllers based on the affinity between threads and data. Such mapping techniques can operate at different hardware and software levels, which impacts their complexity, applicability, and the resulting performance and energy consumption gains. In this article, we introduce a taxonomy to classify different mapping mechanisms and provide a comprehensive overview of existing solutions.
In parallel architectures that have a Non-Uniform Memory Access (NUMA) behavior, the mapping of memory pages to NUMA nodes influences the performance of parallel applications. In order to improve traditional data mapping policies, two basic strategies can be employed: optimizing locality or balance of memory accesses. In a locality-based policy, memory pages are mapped to nodes that access the page the most. In a balance-based policy, memory pages are mapped such that the number of memory accesses resolved by each memory controller is similar.In this paper, we perform an in-depth exploration of these data mapping policies on the performance of parallel applications. We introduce metrics that describe their memory access behavior and evaluate their suitability for data mapping. We also present new mapping policies that focus on locality, balance or both. These policies were evaluated on three different NUMA architectures with applications from the NAS-OMP and PARSEC benchmark suites. Results show that the performance improvements of each policy depend on the characteristics of the applications and machines. Choosing the wrong policy can actually hurt the performance compared to the default first-touch mapping. Compared to traditional mapping policies and to policies that only focus on either locality or balance, taking into account both locality and balance results in the highest improvements. Furthermore, it avoids the performance reduction caused by the wrong data mapping.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.