High Performance Data Persistence in Non-Volatile Memory for Resilient High Performance Computing
Yingchao Huang,
Kai Wu,
Dong Li
Abstract:Resilience is a major design goal for HPC. Checkpoint is the most common method to enable resilient HPC. Checkpoint periodically saves critical data objects to non-volatile storage to enable data persistence. However, using checkpoint, we face dilemmas between resilience, recomputation and checkpoint cost.e reason that accounts for the dilemmas is the cost of data copying inherent in checkpoint. In this paper we explore how to build resilient HPC with non-volatile memory (NVM) as main memory and address the di… Show more
Set email alert for when this publication receives citations?
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.