Multithreaded runtime framework for parallel and adaptive applications

Thomadakis, Polykarpos; Tsolakis, Christos; Chrisochoides, Nikos

doi:10.1007/s00366-022-01713-7

Cited by 6 publications

(3 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Over-decomposition is used to decompose the data domain into more chunks than the number of PEs, allowing PREMA more flexibility to load balance workload and overlap latencies. The effectiveness of this approach has already been demonstrated in previous work for heterogeneous platforms [2,4]. In the context of heterogeneity, host-to-device, and device-to-host memory transfers are broken into pipelined pieces and can be overlapped much more easily with the following kernel invocations.…”

Section: Over-decompositionmentioning

confidence: 96%

“…2 A high-level representation of the heterogeneity-aware PREMA. The hardware devices/interfaces stand on the lower level and are utilized by integrating PREMA with MPI, PThreads, and Argobots (CPU-only; see [2], [4]), and the heterogeneous tasking framework (in the current work). On top of that stands the application, which leverages these capabilities through a simple but powerful interface.…”

Section: Heterogeneous Tasking Frameworkmentioning

confidence: 99%

“…In [2], we presented the Parallel Runtime Environment for Multicomputer Applications (PREMA), a scalable runtime system for distributed homogeneous platforms. It uses high-level abstractions to simplify distributed programming for dynamic and irregular applications.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Runtime Support for Performance Portability on Heterogeneous Distributed Platforms

Thomadakis¹,

Chrisochoides²

2023

Preprint

View full text Add to dashboard Cite

Hardware heterogeneity is here to stay for high-performance computing. Large-scale systems are currently equipped with multiple GPU accelerators per compute node and are expected to incorporate more specialized hardware. This shift in the computing ecosystem offers many opportunities for performance improvement; however, it also increases the complexity of programming for such architectures. This work introduces a runtime framework that enables effortless programming for heterogeneous systems while efficiently utilizing hardware resources. The framework is integrated within a distributed and scalable runtime system to facilitate performance portability across heterogeneous nodes. Along with the design, this paper describes the implementation and optimizations performed, achieving up to 300% improvement on a single device and linear scalability on a node equipped with four GPUs. The framework in a distributed memory environment offers portable abstractions that enable efficient inter-node communication among devices with varying capabilities. It delivers superior performance compared to MPI+CUDA by up to 20% for large messages while keeping the overheads for small messages within 10%. Furthermore, the results of our performance evaluation in a distributed Jacobi proxy application demonstrate that our software imposes minimal overhead and achieves a performance improvement of up to 40%. This is accomplished by the optimizations at the library level as well as by creating opportunities to leverage application-specific optimizations like over-decomposition.

show abstract

Section: Over-decompositionmentioning

confidence: 96%

Section: Heterogeneous Tasking Frameworkmentioning

confidence: 99%

See 1 more Smart Citation

Runtime Support for Performance Portability on Heterogeneous Distributed Platforms

Thomadakis¹,

Chrisochoides²

2023

Preprint

View full text Add to dashboard Cite

show abstract

Speculative anisotropic mesh adaptation on shared memory for CFD applications

Tsolakis,

Chrisochoides

2024

Engineering with Computers

View full text Add to dashboard Cite

Efficient and robust anisotropic mesh adaptation is crucial for Computational Fluid Dynamics (CFD) simulations. The CFD Vision 2030 Study highlights the pressing need for this technology, particularly for simulations targeting supercomputers. This work applies a fine-grained speculative approach to anisotropic mesh operations. Our implementation exhibits more than 90% parallel efficiency on a multi-core node. Additionally, we evaluate our method within an adaptive pipeline for a spectrum of publicly available test-cases that includes both analytically derived and error-based fields. For all test-cases, our results are in accordance with published results in the literature. Support for CAD-based data is introduced, and its effectiveness is demonstrated on one of NASA’s High-Lift prediction workshop cases.

show abstract

Toward runtime support for unstructured and dynamic exascale-era applications

Thomadakis

Chrisochoides

2023

J Supercomput

View full text Add to dashboard Cite

Multithreaded runtime framework for parallel and adaptive applications

Cited by 6 publications

References 34 publications

Runtime Support for Performance Portability on Heterogeneous Distributed Platforms

Runtime Support for Performance Portability on Heterogeneous Distributed Platforms

Speculative anisotropic mesh adaptation on shared memory for CFD applications

Toward runtime support for unstructured and dynamic exascale-era applications

Contact Info

Product

Resources

About