Improving the memory access locality of hybrid MPI applications

Diener, Matthias; White, Sam; Kalé, Laxmikant V.; Campbell, Michael T.; Bodony, Daniel J.; Freund, Jonathan B.

doi:10.1145/3127024.3127038

Cited by 11 publications

(9 citation statements)

References 39 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…In both solutions, scaling was much better than executing the pure MPI‐based implementation. With this experiment, Diener et al observed the influence of memory locality on load balance, A better locality of reference leads to lower stall time due to memory access, resulting in reasonable use of the resources. The same authors reported that AMPI leads to similar locality gains, achieving better performance and scalability than the simplistic hybrid MPI/OpenMP and pure MPI‐based implementations.…”

Section: Related Workmentioning

confidence: 83%

“…For a similar purpose as other investigations, this study analyzed hybrid configurations to evaluate the impact of message passing operations for a large number of processes. Similar to investigations conducted in previous publications, this study evaluated the performance of a hybrid MPI/OpenMP approach concerning load‐balancing and synchronization techniques.…”

Section: Related Workmentioning

confidence: 93%

“…Adaptive message passing interface (AMPI) supports dynamic load balancing, processor virtualization, and fault tolerance for MPI applications. Diener et al investigated how to improve the memory access locality of a hybrid MPI/OpenMP multiphysics simulation application in two different strategies: manually fixing locality (NUMA) issues and by using the AMPI runtime environment. Both strategies showed different trade‐offs.…”

Section: Related Workmentioning

confidence: 99%

“…Similar to the publications described in this section, the present study evaluated parallel models of methods when running on multicore and manycore architectures. Along the same lines of the investigations conducted in previous publications, we implemented hybrid approaches to reduce the NUMA effect. For a similar purpose as other investigations, this study analyzed hybrid configurations to evaluate the impact of message passing operations for a large number of processes.…”

Section: Related Workmentioning

confidence: 99%

See 3 more Smart Citations

An evaluation of MPI and OpenMP paradigms in finite‐difference explicit methods for PDEs on shared‐memory multi‐ and manycore systems

Cabral

Oliveira²,

Osthoff

et al. 2019

Concurrency and Computation

View full text Add to dashboard Cite

Summary This paper focuses on parallel implementations of three two‐dimensional explicit numerical methods on Intel® Xeon® Scalable Processor and the coprocessor Knights Landing. In this study, the performance of a hybrid parallel programming with message passing interface (MPI) and Open Multi‐Processing (OpenMP) and a pure MPI implementation used with two thread binding policies is compared with an improved OpenMP‐based implementation in three explicit finite‐difference methods for solving partial differential equations on shared‐memory multicore and manycore systems. Specifically, the improved OpenMP‐based version is a strategy that synchronizes adjacent threads and eliminates the implicit barriers of a naïve OpenMP‐based implementation. The experiments show that the most suitable approach depends on several characteristics related to the nonuniform memory access (NUMA) effect and load balancing, such as the size of the MPI domain and the number of synchronization points used in the parallel implementation. In algorithms that use four and five synchronization points, hybrid MPI/OpenMP approaches yielded better speedups than the other versions did in runs performed on both systems. The pure MPI‐based strategy, however, achieved better results than the other proposed approaches did in the method that employs only one synchronization point.

show abstract

Section: Related Workmentioning

confidence: 83%

Section: Related Workmentioning

confidence: 93%

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 2 more Smart Citations

An evaluation of MPI and OpenMP paradigms in finite‐difference explicit methods for PDEs on shared‐memory multi‐ and manycore systems

Cabral

Oliveira²,

Osthoff

et al. 2019

Concurrency and Computation

View full text Add to dashboard Cite

show abstract

“…Trabalhos com objetivos parecidos já foram desenvolvidos para outras aplicações, como [Diener et al 2017] em um estudo para aperfeiçoar o acessoà memória em aplicações híbridas MPI/OpenMP e compará-lo com as versões padrão e puramente MPI, e [Bassi et al 2016] que apresenta um modelo de paralelização híbrido MPI/OpenMP para o método Galerkin Descontínuo. Os dois trabalhos desenvolvem a otimização MPI/OpenMP também em arquitetura com memória compartilhada.…”

Section: Trabalhos Relacionadosunclassified

Otimização do Método HOPMOC 1D com auxílio das ferramentas Intel Parallel Studio

Costa¹,

Cabral²,

Osthoff³

2019

Anais Estendidos Do XX Simpósio Em Sistemas Computacionais De Alto Desempenho (SSCAD Estendido 2019)

View full text Add to dashboard Cite

Esse trabalho apresenta um estudo comparativo entre diferentes técnicas de paralelização utilizadas para aumentar o desempenho do método numérico HOPMOC para resolução de equações diferenciais parciais hiperbólicas de problemas de convecção-difusão. O objetivo é avaliar os ganhos de duas estratégias desenvolvidas à partir da versão original do código, com o intuito de diminuir os tempos gastos em barreiras de sincronização, e compará-las entre si. Além disso o trabalho traz um novo estudo em relação às outras publicações envolvendo o HOPMOC: a análise da relação entre Spin Time e CPU Time para comprovar a eficiência das estratégias desenvolvidas.

show abstract

Heterogeneous computing with OpenMP and Hydra

Diener

Kalé

Bodony

2020

Concurrency and Computation

Self Cite

View full text Add to dashboard Cite

Summary High‐performance computing relies on accelerators (such as GPGPUs) to achieve fast execution of scientific applications. Traditionally, these accelerators have been programmed with specialized languages, such as CUDA or OpenCL. In recent years, OpenMP emerged as a promising alternative for supporting accelerators, providing advantages such as maintaining a single code base for the host and different accelerator types and providing a simple way to extend support for accelerators to existing application codes. Efficiently using this support requires solving several challenges, related to performance, work partitioning, and concurrent execution on multiple device types. In this article, we discuss our experiences with using OpenMP for accelerators and present performance guidelines. We also introduce a library, Hydra, that addresses several of the challenges of using OpenMP for such devices. We apply Hydra to a scientific application, PlasCom2, that has not previously been able to use accelerators. Experiments on three architectures show that Hydra results in performance gains of up to 10× compared with CPU‐only execution. Concurrent execution on the host and GPU resulted in additional gains of up to 20% compared to running on the GPU only.

show abstract

Improving the memory access locality of hybrid MPI applications

Cited by 11 publications

References 39 publications

An evaluation of MPI and OpenMP paradigms in finite‐difference explicit methods for PDEs on shared‐memory multi‐ and manycore systems

An evaluation of MPI and OpenMP paradigms in finite‐difference explicit methods for PDEs on shared‐memory multi‐ and manycore systems

Otimização do Método HOPMOC 1D com auxílio das ferramentas Intel Parallel Studio

Heterogeneous computing with OpenMP and Hydra

Contact Info

Product

Resources

About