2015 IEEE International Parallel and Distributed Processing Symposium 2015
DOI: 10.1109/ipdps.2015.82
|View full text |Cite
|
Sign up to set email alerts
|

Leveraging Naturally Distributed Data Redundancy to Reduce Collective I/O Replication Overhead

Abstract: Dumping large amounts of related data simultaneously to local storage devices instead of a parallel file system is a frequent I/O pattern of HPC applications running at large scale. Since local storage resources are prone to failures and have limited potential to serve multiple requests in parallel, techniques such as replication are often used to enable resilience and high availability. However, replication introduces overhead, both in terms of network traffic necessary to distribute replicas, as well as extr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
4
1
1

Relationship

3
3

Authors

Journals

citations
Cited by 13 publications
(4 citation statements)
references
References 21 publications
0
4
0
Order By: Relevance
“…(ii) Some data replication strategies are based on techniques that achieve specific tenant SLOs. Examples of these techniques are: compression (Liu and Shen, 2016a) for data durability, or for reducing replication bandwidth (Xu et al, 2015), multi-failure resilient scheme (Liu and Shen, 2016b) for enhancing availability, de-duplication (Nicolae, 2015) for reducing data transfer, prefetching (Mansouri and Javidi, 2018), data migration (Mansouri and Buyya, 2019), parallel downloading (Mansouri et al, 2017), data mining (Hamrouni and Charrada, 2015), supervised learning (Bui et al, 2016), overheating similarity of nodes (Sun et al, 2018), partitioning (Zhou and Fan, 2017) for ensuring performance and fragmentation for optimal security (Ali et al, 2018). On the other hand, many corporations, e.g., Facebook, as well as many replication strategies (Bui et al, 2016) are based on the erasure coding technique rather than/in addition to data replication.…”
Section: Related Workmentioning
confidence: 99%
“…(ii) Some data replication strategies are based on techniques that achieve specific tenant SLOs. Examples of these techniques are: compression (Liu and Shen, 2016a) for data durability, or for reducing replication bandwidth (Xu et al, 2015), multi-failure resilient scheme (Liu and Shen, 2016b) for enhancing availability, de-duplication (Nicolae, 2015) for reducing data transfer, prefetching (Mansouri and Javidi, 2018), data migration (Mansouri and Buyya, 2019), parallel downloading (Mansouri et al, 2017), data mining (Hamrouni and Charrada, 2015), supervised learning (Bui et al, 2016), overheating similarity of nodes (Sun et al, 2018), partitioning (Zhou and Fan, 2017) for ensuring performance and fragmentation for optimal security (Ali et al, 2018). On the other hand, many corporations, e.g., Facebook, as well as many replication strategies (Bui et al, 2016) are based on the erasure coding technique rather than/in addition to data replication.…”
Section: Related Workmentioning
confidence: 99%
“…VELOC [24,31] takes this approach further by introducing asynchronous techniques to apply such complementary strategies in the background. When the checkpoints of different processes have similar content, techniques such as [20,21] can be applied to complement multi-level checkpointing. However, redundancy is detected onthe-fly, which can be an unnecessary overhead for clone and revisit (e.g., model replicas are known to be identical for data-parallel training).…”
Section: Related Work and Positioningmentioning
confidence: 99%
“…Deduplication is a common technique used in a variety of scenarios, both obvious (e.g., saving space in file systems [22], [11] or reducing the size of large scale memory dumps [18]) and less obvious (e.g. detection of natural replicas to reduce the cost of replication-based resilience [19]).…”
Section: Related Workmentioning
confidence: 99%