An approach to increase reliability of HPC simulation, application to the Gysela5D code

Bigot, Julien; Latu, Guillaume; Cartier-Michaud, Thomas; Grandgirard, V.; Passeron, C.; Rozar, Fabien

doi:10.1051/proc/201653015

Cited by 3 publications

(3 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The optimization process can thus be executed independently of the compilation, at a much lower frequency, typically only when the kernel has been modified or the machine hardware and software stack has changed. An interesting improvement of this process would be to automate it by using the already existing gysela continuous integration platform [6].…”

Section: Impact Of Block Size For the Different Platforms And Compilersmentioning

confidence: 99%

See 1 more Smart Citation

Building and Auto-Tuning Computing Kernels: Experimenting with Boast and Starpu in the Gysela Code

et al. 2018

Self Cite

View full text Add to dashboard Cite

Modeling turbulent transport is a major goal in order to predict confinement performance in a tokamak plasma. The gyrokinetic framework considers a computational domain in five dimensions to look at kinetic issues in a plasma; this leads to huge computational needs. Therefore, optimization of the code is an especially important aspect, especially since coprocessors and complex manycore architectures are foreseen as building blocks for Exascale systems. This project aims to evaluate the applicability of two auto-tuning approaches with the BOAST and StarPU tools on the gysela code in order to circumvent performance portability issues. A specific computation intensive kernel is considered in order to evaluate the benefit of these methods. StarPU enables to match the performance and even sometimes outperform the hand-optimized version of the code while leaving scheduling choices to an automated process. BOAST on the other hand reveals to be well suited to get a gain in terms of execution time on four architectures. Speedups in-between 1.9 and 5.7 are obtained on a cornerstone computation intensive kernel.

show abstract

Section: Impact Of Block Size For the Different Platforms And Compilersmentioning

confidence: 99%

“…The code for the Lagrange 1D kernel is in C. It's made of three functions: fn_init, which initializes the input data; fn_advec, which is the kernel; and fn_verif, which verifies the correctness of the output. The source code for this kernel, with the auto-tuning code, is available in the BOAST documentation 6 .…”

Section: A Boast Tutorialmentioning

confidence: 99%

Building and Auto-Tuning Computing Kernels: Experimenting with Boast and Starpu in the Gysela Code

et al. 2018

Self Cite

View full text Add to dashboard Cite

show abstract

“…The methods described in this article will be applicable in numerous categories, i.a., cosmological applications, climate simulations, and reanalyses. For instance, GISELA5D, a plasma simulation for fusion reactions, already uses the HDF5 format to combine resiliency and analysis [5].…”

Section: Introductionmentioning

confidence: 99%

Design and Study of Elastic Recovery in HPC Applications

Keller

Parasyris

Bautista-Gomez

2020

2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)

View full text Add to dashboard Cite

The efficient utilization of current supercomputing systems with deep storage hierarchies demands scientific applications that are capable of leveraging such heterogeneous hardware. Fault tolerance, and checkpointing in particular, is one of the most time-consuming aspects if not handled correctly. High checkpoint performance can be achieved using optimized multilevel checkpoint and restart libraries. Unfortunately, those libraries do not allow for restarts with a modified number of processes or scientific post-processing of the checkpointed data. This is because they typically use an N-N checkpointing scheme and opaque file-formats. In this article, we present a novel mechanism to asynchronously store checkpoints into a selfdescriptive file format and load the data upon recovery with a different number of processes. We provide an API that defines the process-local data as part of a globally shared dataset. Our measurements demonstrate a low overhead between 0.6% and 2.5% for a 2.25 TB checkpoint with 6K processes.

show abstract

Analysis of the Impact of New Energy Access on Smart Distribution Network Operation Based on Deep Learning Technology

Qiu

Tang

Kuang

2022

Cyber Security Intelligence and Analytics

View full text Add to dashboard Cite

An approach to increase reliability of HPC simulation, application to the Gysela5D code

Cited by 3 publications

References 9 publications

Building and Auto-Tuning Computing Kernels: Experimenting with Boast and Starpu in the Gysela Code

Building and Auto-Tuning Computing Kernels: Experimenting with Boast and Starpu in the Gysela Code

Design and Study of Elastic Recovery in HPC Applications

Analysis of the Impact of New Energy Access on Smart Distribution Network Operation Based on Deep Learning Technology

Contact Info

Product

Resources

About