2020 23rd Conference on Innovation in Clouds, Internet and Networks and Workshops (ICIN) 2020
DOI: 10.1109/icin48450.2020.9059482
|View full text |Cite
|
Sign up to set email alerts
|

Cloud2HDD: Large-Scale HDD Data Analysis on Cloud for Cloud Datacenters

Abstract: The main focus of this paper is to develop a distributed large scale data analysis platform for the opensource data of Backblaze cloud datacenter which consists of operational hard disk drive (HDD) information collected over an observable period of 2272 days (over 74 months). To carefully analyze the intrinsic characteristics of the hard disk behavior, we have exploited a large bolume of data and the benefits of Hadoop ecosystem as our big data processing engine. In other words, we have utilized a special dist… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
2
2
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 13 publications
0
4
0
Order By: Relevance
“…The heavy-tailedness of disk failure lifetime is quite well known and modeled in literature [8]. One of the most critical trend is to use collected data (For instance, Backblaze provides such datasets for research [9]) to model data storage system reliability for predictive maintenance [10]. Later in the document (Section 6) we will demonstrate one specific use case of data for modeling complex systems to help with the conventional mathematical tools for estimating dependibility of storage systems.…”
Section: Disks In Real Life and Relevant Researchmentioning
confidence: 99%
See 2 more Smart Citations
“…The heavy-tailedness of disk failure lifetime is quite well known and modeled in literature [8]. One of the most critical trend is to use collected data (For instance, Backblaze provides such datasets for research [9]) to model data storage system reliability for predictive maintenance [10]. Later in the document (Section 6) we will demonstrate one specific use case of data for modeling complex systems to help with the conventional mathematical tools for estimating dependibility of storage systems.…”
Section: Disks In Real Life and Relevant Researchmentioning
confidence: 99%
“…Suppose further that x of these requests are from the failed set, and k −x are from the available and operational ones. Due to sampling without replacement, probability of that happening is given by the hypergeometric distribution 10 . In this particular condition, we need to wait for the x failed carriers to be repaired first which is given by the maximum repair time and typically not distributed exponentially.…”
Section: Time-dependent Carrier Repairmentioning
confidence: 99%
See 1 more Smart Citation
“…Many researchers and industrial partners have also focused their efforts on analyzing real-world datasets. For example, there are plenty of works in the literature that concentrated on using Backblaze's public dataset to extract some useful models [21]- [28]. Other works such as [29] base their analysis (latent sector error analysis for reliability in this case) on proprietary data collected from production-ready storage systems.…”
Section: A Related Workmentioning
confidence: 99%