Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing 2018
DOI: 10.1145/3208040.3208050
|View full text |Cite
|
Sign up to set email alerts
|

Improving performance of iterative methods by lossy checkponting

Abstract: Iterative methods are commonly used approaches to solve large, sparse linear systems, which are fundamental operations for many modern scientific simulations. When the large-scale iterative methods are running with a large number of ranks in parallel, they have to checkpoint the dynamic variables periodically in case of unavoidable fail-stop errors, requiring fast I/O systems and large storage space. To this end, significantly reducing the checkpointing overhead is critical to improving the overall performance… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
14
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 33 publications
(14 citation statements)
references
References 52 publications
0
14
0
Order By: Relevance
“…Implement a mechanism that accelerates the transfer rate, by improving I/O, use incremental checkpoints and/or compress [16] the checkpoints even more.…”
Section: Discussionmentioning
confidence: 99%
“…Implement a mechanism that accelerates the transfer rate, by improving I/O, use incremental checkpoints and/or compress [16] the checkpoints even more.…”
Section: Discussionmentioning
confidence: 99%
“…Second, the performance degradation due to the extra application iterations must be mitigated by the time saved on I/O and storage operations. Third, in order to optimize the overall execution performance in the presence of failures, one needs to calculate the checkpoint intervals based on the revised performance model proposed in (Tao et al, 2018). According to the model, the compression and decompression speeds need to be able to be estimated based on the (compressed) checkpoint size and user-set compression error bound.…”
Section: Accelerating Checkpoint/restartmentioning
confidence: 99%
“…For demonstration purposes, we use the sparse matrix arising from discretizing a 3-D Poisson’s equation. We refer readers to (Tao et al, 2018) for the matrix details. We use the PETSc (v3.8) (Balay et al, 2018) library for GMRES and its default preconditioner (block Jacobi with ILU/IC).…”
Section: Lossy Compression Use Casesmentioning
confidence: 99%
See 2 more Smart Citations