Adjoint methods are an efficient approach for computing gradient information. Together with the favorable temporal complexity result for the computation of adjoints, however, comes a memory requirement that is in essence proportional to the operation count of the underlying function, for example, if algorithmic differentiation is used to provide the adjoints. For this reason, several checkpointing approaches, including binomial checkpointing, have become popular. This paper analyzes an extension of checkpointing strategies to cover restarting the computation of adjoints. Such an extension is of special interest for long-running, parallel simulations executing on large-scale computing systems, since the simulations cannot complete the calculation of the adjoints within a maximal time allocation. We describe an exhaustive search to determine checkpointing strategies with minimal runtime when covering resilience, analyze their structure and show the resulting construction principle.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.