Leveraging Naturally Distributed Data Redundancy to Reduce Collective I/O Replication Overhead

Nicolae, Bogdan

doi:10.1109/ipdps.2015.82

Cited by 13 publications

(4 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…(ii) Some data replication strategies are based on techniques that achieve specific tenant SLOs. Examples of these techniques are: compression (Liu and Shen, 2016a) for data durability, or for reducing replication bandwidth (Xu et al, 2015), multi-failure resilient scheme (Liu and Shen, 2016b) for enhancing availability, de-duplication (Nicolae, 2015) for reducing data transfer, prefetching (Mansouri and Javidi, 2018), data migration (Mansouri and Buyya, 2019), parallel downloading (Mansouri et al, 2017), data mining (Hamrouni and Charrada, 2015), supervised learning (Bui et al, 2016), overheating similarity of nodes (Sun et al, 2018), partitioning (Zhou and Fan, 2017) for ensuring performance and fragmentation for optimal security (Ali et al, 2018). On the other hand, many corporations, e.g., Facebook, as well as many replication strategies (Bui et al, 2016) are based on the erasure coding technique rather than/in addition to data replication.…”

Section: Related Workmentioning

confidence: 99%

A data replication strategy with tenant performance and provider economic profit guarantees in Cloud data centers

Mokadem

Hameurlain

2020

Journal of Systems and Software

View full text Add to dashboard Cite

Meeting tenant performance requirements through data replication while ensuring an economic profit is very challenging for cloud providers. For this purpose, we propose a data Replication Strategy that satisfies Performance tenant objective and provider profit in Cloud data centers (RSPC). Before the exe cution of e.ach tenant query Q. data replication is considered only if: (i) the estimated Response Time of Q (RT Q) exceeds a critical RT threshold (per-query replication), or (ii) more often, if RT Q exceeds another (lower) RT threshold for a given number of times (replication per set of queries). Then, a new replica is really created only if a suitable replica placement is heuristically found so that the RT requirement is satisfied again while ensuring an economic profit for the provider. Both the provider's revenues and expenditures are also estimated while penalties and replication costs are taken into account. Further more, the replica factor is dynamically adjusted in order to reduce the resource consumption. Compared to four other strategies, RSPC best satisfies the RT requirement under high loads, complex queries and strict RT thresholds. Moreover, penalty and data transfer costs are significantly reduced, which impacts the provider profit.

show abstract

Section: Related Workmentioning

confidence: 99%

A data replication strategy with tenant performance and provider economic profit guarantees in Cloud data centers

Mokadem

Hameurlain

2020

Journal of Systems and Software

View full text Add to dashboard Cite

show abstract

“…VELOC [24,31] takes this approach further by introducing asynchronous techniques to apply such complementary strategies in the background. When the checkpoints of different processes have similar content, techniques such as [20,21] can be applied to complement multi-level checkpointing. However, redundancy is detected onthe-fly, which can be an unnecessary overhead for clone and revisit (e.g., model replicas are known to be identical for data-parallel training).…”

Section: Related Work and Positioningmentioning

confidence: 99%

DataStates: Towards Lightweight Data Models for Deep Learning

Nicolae

2020

Communications in Computer and Information Science

Self Cite

View full text Add to dashboard Cite

A key emerging pattern in deep learning applications is the need to capture intermediate DNN model snapshots and preserve or clone them to explore a large number of alternative training and/or inference paths. However, with increasing model complexity and new training approaches that mix data, model, pipeline and layer-wise parallelism, this pattern is challenging to address in a scalable and efficient manner. To this end, this position paper advocates for rethinking how to represent and manipulate DNN learning models. It relies on a broader notion of data states, a collection of annotated, potentially distributed data sets (tensors in the case of DNN models) that AI applications can capture at key moments during the runtime and revisit/reuse later. Instead explicitly interacting with the storage layer (e.g., write to a file), users can "tag" DNN models at key moments during runtime with metadata that expresses attributes and persistency/movement semantics. A highperformance runtime is the responsible to interpret the metadata and perform the necessary actions in the background, while offering a rich interface to find data states of interest. Using this approach has benefits at several levels: new capabilities, performance portability, high performance and scalability.

show abstract

“…Deduplication is a common technique used in a variety of scenarios, both obvious (e.g., saving space in file systems [22], [11] or reducing the size of large scale memory dumps [18]) and less obvious (e.g. detection of natural replicas to reduce the cost of replication-based resilience [19]).…”

Section: Related Workmentioning

confidence: 99%

Exploring Shared State in Key-Value Store for Window-Based Multi-pattern Streaming Analytics

Marcu¹,

Tudoran²,

Nicolae³

et al. 2017

2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)

Self Cite

View full text Add to dashboard Cite

We are now witnessing an unprecedented growth of data that needs to be processed at always increasing rates in order to extract valuable insights. Big Data streaming analytics tools have been developed to cope with the online dimension of data processing: they enable real-time handling of live data sources by means of stateful aggregations (operators). Current state-of-art frameworks (e.g. Apache Flink [1]) enable each operator to work in isolation by creating data copies, at the expense of increased memory utilization. In this paper, we explore the feasibility of deduplication techniques to address the challenge of reducing memory footprint for window-based stream processing without significant impact on performance. We design a deduplication method specifically for windowbased operators that rely on key-value stores to hold a shared state. We experiment with a synthetically generated workload while considering several deduplication scenarios and based on the results, we identify several potential areas of improvement. Our key finding is that more fine-grained interactions between streaming engines and (key-value) stores need to be designed in order to better respond to scenarios that have to overcome memory scarcity.

show abstract

Leveraging Naturally Distributed Data Redundancy to Reduce Collective I/O Replication Overhead

Cited by 13 publications

References 21 publications

A data replication strategy with tenant performance and provider economic profit guarantees in Cloud data centers

A data replication strategy with tenant performance and provider economic profit guarantees in Cloud data centers

DataStates: Towards Lightweight Data Models for Deep Learning

Exploring Shared State in Key-Value Store for Window-Based Multi-pattern Streaming Analytics

Contact Info

Product

Resources

About