Cloud<sup>2</sup>HDD: Large-Scale HDD Data Analysis on Cloud for Cloud Datacenters

Zeydan, Engin; Arslan, Suayb S.

doi:10.1109/icin48450.2020.9059482

Cited by 5 publications

(4 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The heavy-tailedness of disk failure lifetime is quite well known and modeled in literature [8]. One of the most critical trend is to use collected data (For instance, Backblaze provides such datasets for research [9]) to model data storage system reliability for predictive maintenance [10]. Later in the document (Section 6) we will demonstrate one specific use case of data for modeling complex systems to help with the conventional mathematical tools for estimating dependibility of storage systems.…”

Section: Disks In Real Life and Relevant Researchmentioning

confidence: 99%

“…Suppose further that x of these requests are from the failed set, and k −x are from the available and operational ones. Due to sampling without replacement, probability of that happening is given by the hypergeometric distribution 10 . In this particular condition, we need to wait for the x failed carriers to be repaired first which is given by the maximum repair time and typically not distributed exponentially.…”

Section: Time-dependent Carrier Repairmentioning

confidence: 99%

“…The present discussion only slightly changes in case such class of codes are used instead. 10 Sampling with replacement would lead to a Binomially distributed statistics instead.…”

Section: Time-dependent Carrier Repairmentioning

confidence: 99%

See 2 more Smart Citations

Durability and Availability of Erasure-Coded Storage Systems with Concurrent Maintenance

Arslan¹

2023

Preprint

View full text Add to dashboard Cite

This initial version of this document was written back in 2014 for the sole purpose of providing fundamentals of reliability theory as well as to identify the theoretical types of machinery for the prediction of durability/availability of erasure-coded storage systems. Since the definition of a "system" is too broad, we specifically focus on warm and cold storage systems where the data is stored in a distributed fashion across different storage units with or without continuous (full duty-cycle) operation.The contents of this document are dedicated to a review of fundamentals, a few major improved stochastic models, and several contributions of my work relevant to the field. One of the interesting contributions of this document is the introduction of the most general form of Markov models for the estimation of mean time to failure numbers. This work was partially later published in IEEE Transactions on Reliability. Very good approximations for the closed-form solutions for this general model are also investigated. Various storage configurations under different policies are compared using such advanced models. Later in a subsequent chapter, we have also considered multi-dimensional Markov models to address detached drive-medium combinations such as those found in optical disk and tape storage systems. It is not hard to anticipate such a system structure would most likely be part of future DNA storage libraries and hence find a plethora of interesting applications. This work is partially published in Elsevier Reliability and System Safety.Topics that include simulation modelings for more accurate estimations are included towards the end of the document by noting the deficiencies of the simplified canonical as well as more complex Markov models, due mainly to the stationary and static nature of Markovinity. Throughout the document, we shall focus on concurrently maintained systems although the discussions will only slightly change for the systems repaired one device at a time. The document is still under construction and future versions might likely include newer models and novel approaches to enrich the present contents. Some background on probability and coding theory might be expected that are briefly mentioned in the beginning of the document.

show abstract

Section: Disks In Real Life and Relevant Researchmentioning

confidence: 99%

Section: Time-dependent Carrier Repairmentioning

confidence: 99%

See 1 more Smart Citation

Durability and Availability of Erasure-Coded Storage Systems with Concurrent Maintenance

Arslan¹

2023

Preprint

View full text Add to dashboard Cite

show abstract

“…Many researchers and industrial partners have also focused their efforts on analyzing real-world datasets. For example, there are plenty of works in the literature that concentrated on using Backblaze's public dataset to extract some useful models [21]- [28]. Other works such as [29] base their analysis (latent sector error analysis for reliability in this case) on proprietary data collected from production-ready storage systems.…”

Section: A Related Workmentioning

confidence: 99%

On the Distribution Modeling of Heavy-Tailed Disk Failure Lifetime in Big Data Centers

Arslan

Zeydan

2021

IEEE Trans. Rel.

Self Cite

View full text Add to dashboard Cite

It has become commonplace to observe frequent multiple disk failures in big data centers in which thousands of drives operate simultaneously. Disks are typically protected by replication or erasure coding to guarantee a predetermined reliability. However, in order to optimize data protection, real life disk failure trends need to be modeled appropriately. The classical approach to modeling is to estimate the probability density function of failures using non-parametric estimation techniques such as Kernel Density Estimation (KDE). However, these techniques are suboptimal in the absence of the true underlying density function. Moreover, insufficient data may lead to overfitting. In this study, we propose to use a set of transformations to the collected failure data for almost perfect regression in the transform domain. Then, by inverse transformation, we analytically estimated the failure density through the efficient computation of moment generating functions and hence the density functions. Moreover, we developed a visualization platform to extract useful statistical information such as model-based mean time to failure. Our results indicate that for other heavy-tailed data, complex Gaussian Hypergeometric Distribution (GHD) and classical KDE approach can perform best if overfitting problem can be avoided and complexity burden is overtaken. On the other hand, we show that the failure distribution exhibits less complex Argus-like distribution after performing Box-Cox transformation up to appropriate scaling and shifting operations.

show abstract

Failures Forecast in Monitoring Datacenter Infrastructure Through Machine Learning Techniques: A Systematic Review

Neto

Filho

2021

Computational Science and Its Applications – ICCSA 2021

View full text Add to dashboard Cite

Cloud²HDD: Large-Scale HDD Data Analysis on Cloud for Cloud Datacenters

Cited by 5 publications

References 13 publications

Durability and Availability of Erasure-Coded Storage Systems with Concurrent Maintenance

Durability and Availability of Erasure-Coded Storage Systems with Concurrent Maintenance

On the Distribution Modeling of Heavy-Tailed Disk Failure Lifetime in Big Data Centers

Failures Forecast in Monitoring Datacenter Infrastructure Through Machine Learning Techniques: A Systematic Review

Contact Info

Product

Resources

About

Cloud2HDD: Large-Scale HDD Data Analysis on Cloud for Cloud Datacenters

Cited by 5 publications

References 13 publications

Durability and Availability of Erasure-Coded Storage Systems with Concurrent Maintenance

Durability and Availability of Erasure-Coded Storage Systems with Concurrent Maintenance

On the Distribution Modeling of Heavy-Tailed Disk Failure Lifetime in Big Data Centers

Failures Forecast in Monitoring Datacenter Infrastructure Through Machine Learning Techniques: A Systematic Review

Contact Info

Product

Resources

About

Cloud²HDD: Large-Scale HDD Data Analysis on Cloud for Cloud Datacenters