Hugo Taboada scite author profile

Hugo Taboada

3Publications

5Citation Statements Received

43Citation Statements Given

How they've been cited

How they cite others

Affiliations

CEA DAM Île-de-France, Institut Polytechnique de Bordeaux, Laboratoire Bordelais de Recherche en Informatique

Publications

Order By: Most citations

Study on progress threads placement and dedicated cores for overlapping MPI nonblocking collectives on manycore processor

Denis

Jaeger

Jeannot

et al. 2019

The International Journal of High Performance Computing Applica

View full text Add to dashboard Cite

To amortize the cost of MPI collective operations, nonblocking collectives have been proposed so as to allow communications to be overlapped with computation. Unfortunately, collective communications are more CPU-hungry than point-to-point communications and running them in a communication thread on a dedicated CPU core makes them slow. On the other hand, running collective communications on the application cores leads to no overlap. In this article, we propose placement algorithms for progress threads that do not degrade performance when running on cores dedicated to communications to get communication/computation overlap. We first show that even simple collective operations, such as those based on a chain topology, are not straightforward to make progress in background on a dedicated core. Then, we propose an algorithm for tree-based collective operations that splits the tree between communication cores and application cores. To get the best of both worlds, the algorithm runs the short but heavy part of the tree on application cores, and the long but narrow part of the tree on one or several communication cores, so as to get a trade-off between overlap and absolute performance. We provide a model to study and predict its behavior and to tune its parameters. We implemented both algorithms in the multiprocessor computing framework, which is a thread-based MPI implementation. We have run benchmarks on manycore processors such as the KNL and Skylake and get good results for both performance and overlap.

show abstract

Progress Thread Placement for Overlapping MPI Non-blocking Collectives Using Simultaneous Multi-threading

Denis

Jaeger

Taboada

2018

View full text Add to dashboard Cite

Non-blocking collectives have been proposed so as to allow communications to be overlapped with computation in order to amortize the cost of MPI collective operations. To obtain a good overlap ratio, communications and computation have to run in parallel. To achieve this, different hardware and software techniques exists. Dedicated some cores to run progress threads is one of them. However, some CPUs provide Simultaneous Multi-Threading, which is the ability for a core to have multiple hardware threads running simultaneously, sharing the same arithmetic units. Our idea is to use them to run progress threads to avoid dedicated cores allocation. We have run benchmarks on Haswell processors, using its Hyper-Threading capability, and get good results for both performance and overlap only when inter-node communications are used by MPI processes. However, we also show that enabling Simultaneous Multi-Threading for intra-communications leads to bad performances due to cache effects.

show abstract

Dynamic Placement of Progress Thread for Overlapping MPI Non-blocking Collectives on Manycore Processor

Denis

Jaeger

Jeannot

et al. 2018

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Hugo Taboada

Study on progress threads placement and dedicated cores for overlapping MPI nonblocking collectives on manycore processor

Progress Thread Placement for Overlapping MPI Non-blocking Collectives Using Simultaneous Multi-threading

Dynamic Placement of Progress Thread for Overlapping MPI Non-blocking Collectives on Manycore Processor

Contact Info

Product

Resources

About