I. WHY GPU-BASED CHECKPOINT COMPRESSION?Checkpoint/restart protocols periodically record the address space state of all processes in an application execution instance to stable storage. Upon failures, new incarnations of failed processes are recovered from the failed processes' most recent checkpoints. Various strategies have been explored for improving checkpoint/restart efficiency including strategies that hide or reduce checkpoint commit latencies, for example by reducing checkpoint sizes. One such optimization, increment-based checkpointing, only saves the incremental changes in a process's address space between subsequent checkpoints.In previous work, we developed a checkpoint compression viability model based on compression factor, compression speed and I/O bandwidth that outputs when checkpoint data compression yields performance improvements [1]. We evaluated the impact of checkpoint compression on overall application performance using an extension of Daly's model. This evaluation was based on CPU-based checkpoint compression performance and demonstrated that checkpoint data compression can improve an application makespan significantly. Now, we compare compression-based and increment-based optimizations and begin to explore how GPU-based checkpoint compression might further improve checkpoint/restart performance. Questions we wish to answer include:• How do compression-based and increment-based checkpoints optimizations compare? • Does the combination of compression-based and increment-based optimizations yield further improvements? • Can faster, GPU-based compression algorithms improve checkpoint compression viability and, as a result, improve application makespan?
II. METHODOLOGYWe collected checkpoint compression performance data using the following setup 1 : • Applications: We performed our experiments with a set of mini apps from the Mantevo Project namely HPCCG, 1 For detailed references about our experimental setup we refer to our previous study[1]. pHPCCG, phdMesh and miniFE along with LAMMPS, a key simulation workload for Department of Energy. • Checkpoint Libraries: We used BLCR as our system level checkpoint library to generate checkpoints at a small interval uniformly distributed over the application runs. We also used LAMMPS' capability of generating checkpoints and generated checkpoints using the builtin checkpoint library. • Compression Utilities: We chose popular compression tools from linux's software stack for example parallel bzip, bzip, zip, rzip, 7zip etc and a parallel CUDAbased compression algorithm GFC[2] as our GPUbased compression routine.We fed collected data into our our application efficiency model, which now includes increment-based checkpointing. The modified model takes two additional parameters -the number of increment-based checkpoints between two full checkpoints and the ratio between the size of an incrementbased checkpoint and a full checkpoint. We assume each checkpoint increment is 1/5 th the size of a regular checkpoint, an optimal number of increments between check...