Proceedings of the Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems 2013
DOI: 10.1145/2530268.2530271
|View full text |Cite
|
Sign up to set email alerts
|

A study of application-level recovery methods for transient network faults

Abstract: With the increasing number of components in HPC systems, transient faults will become commonplace. Today, network transient faults, such as lost or corrupted network packets, are addressed by middleware libraries at the cost of high memory usage and packet retransmissions. These costs, however, can be eliminated using application-level fault tolerance. In this paper, we propose recovery methods for transient network faults at the application level. These methods reconstruct missing or corrupted data via interp… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 25 publications
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?