“…State-of-the-art techniques that combine distributed-and shared-memory programming models [80], as well as many PGAS approaches [6,24,47,48], have demon-strated the potential benefits of combining both levels of parallelism [81,82,39,83], including increased communication-computation overlap [84,85], improved memory utilization [86,87], power optimization [88] and effective use of accelerators [89,90,91,92]. The hybrid MPI and thread model, such as MPI and OpenMP, can take advantage of those optimized shared-memory algorithms and data structures.…”