Predicting Disk Replacement towards Reliable Data Centers

Botezatu, Mirela; Giurgiu, Ioana; Bogojeska, Jasmina; Wiesmann, D.

doi:10.1145/2939672.2939699

Cited by 130 publications

(51 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The idea of using RNNs to capture intricate dependencies among various time cycles of sensor observations is emphasized in [167] for prognostic applications. Botezatu et al, came up with some rules for directly identifying the healthy or unhealthy state of a device in [168], employing a disk replacement prediction algorithm with changepoint detection applied to time series Backblaze data.…”

Section: Prognostics and Health Managementmentioning

confidence: 99%

A review of deep learning with special emphasis on architectures, applications and recent trends

Sengupta

Basak

Saikia

et al. 2020

Knowledge-Based Systems

321

143

View full text Add to dashboard Cite

Deep learning has solved a problem that as little as five years ago was thought by many to be intractable -the automatic recognition of patterns in data; and it can do so with an accuracy that often surpasses that of human beings. It has solved problems beyond the realm of traditional, hand-crafted machine learning algorithms and captured the imagination of practitioners trying to make sense out of the flood of data that now inundates our society. As public awareness of the efficacy of deep learning increases so does the desire to make use of it. But even for highly trained professionals it can be daunting to approach the rapidly increasing body of knowledge produced by experts in the field. Where does one start? How does one determine if a particular Deep Learning model is applicable to their problem? How does one train and deploy such a network? A primer on the subject can be a good place to start. With that in mind, we present an overview of some of the key multilayer artificial neural networks that comprise deep learning. We also discuss some new automatic architecture optimization protocols that use multi-agent approaches. Further, since guaranteeing system uptime is becoming critical to many computer applications, we include a section on using neural networks for fault detection and subsequent mitigation. This is followed by an exploratory survey of several application areas where deep learning has emerged as a game-changing technology: anomalous behavior detection in financial applications or in financial time-series forecasting, predictive and prescriptive analytics, medical image processing and analysis and power systems research. The thrust of this review is to outline emerging areas of application-oriented research within the deep learning community as well as to provide a handy reference to researchers seeking to use deep learning in their work for what it does best: statistical pattern recognition with unparalleled learning capacity with the ability to scale with information.

show abstract

Section: Prognostics and Health Managementmentioning

confidence: 99%

A review of deep learning with special emphasis on architectures, applications and recent trends

Sengupta

Basak

Saikia

et al. 2020

Knowledge-Based Systems

321

143

View full text Add to dashboard Cite

show abstract

“…Recent work on device health monitoring as evidenced in [9] reinforce the idea of using RNNs to capture intricate dependencies among sensor observations across time cycles of dynamic period range. In [10], the authors came up with disk replacement prediction algorithm with changepoint detection in time series Backblaze data and concluded some rules for directly identifying the state of a device: healthy or faulty. Aussel et al, [6] used the same dataset to perform hard drive failure prediction with SVM, RF and GBT and discussed their performances based on precision and recall.…”

Section: B Related Workmentioning

confidence: 99%

Mechanisms for Integrated Feature Normalization and Remaining Useful Life Estimation Using LSTMs Applied to Hard-Disks

Basak

Sengupta

Dubey

2019

2019 IEEE International Conference on Smart Computing (SMARTCOMP)

View full text Add to dashboard Cite

In this paper we focus on application of data-driven methods for remaining useful life estimation in components where past failure data is not uniform across devices, i.e. there is a high variance in the minimum and maximum value of the key parameters. The system under study is the hard disks used in computing cluster. The data used for analysis is provided by Backblaze as discussed later. In the article, we discuss the architecture of of the long short term neural network used and describe the mechanisms to choose the various hyper-parameters. Further, we describe the challenges faced in extracting effective training sets from highly unorganized and class-imbalanced big data and establish methods for online predictions with extensive data pre-processing, feature extraction and validation through online simulation sets with unknown remaining useful lives of the hard disks. Our algorithm performs especially well in predicting RUL near the critical zone of a device approaching failure. With the proposed approach we are able to predict whether a disk is going to fail in next ten days with an average precision of 0.8435. We also show that the architecture trained on a particular model is generalizable and transferable as it can be used to predict RUL for devices in other models from same manufacturer.

show abstract

“…Backblaze data-center maintains the record of the hard disk and any failures encountered by hard disk's manufacturer and make. Most of the prior works [18] have performed failure detection rather than making failure prediction as we formulated in Section III. We used the data for the Seagate hard disk model ST4000DM000 from Jan 2014 to June 2015, as it has the largest number of observations and the data collection methods were changed thereafter.…”

Section: A Datasetsmentioning

confidence: 99%

Two Birds with One Network: Unifying Failure Event Prediction and Time-to-failure Modeling

Aggarwal

Atan

Farahat

et al. 2018

2018 IEEE International Conference on Big Data (Big Data)

View full text Add to dashboard Cite

One of the key challenges in predictive maintenance is to predict the impending downtime of an equipment with a reasonable prediction horizon so that countermeasures can be put in place. Classically, this problem has been posed in two different ways which are typically solved independently: (1) Remaining useful life (RUL) estimation as a long-term prediction task to estimate how much time is left in the useful life of the equipment and (2) Failure prediction (FP) as a short-term prediction task to assess the probability of a failure within a prespecified time window. As these two tasks are related, performing them separately is sub-optimal and might results in inconsistent predictions for the same equipment. In order to alleviate these issues, we propose two methods: Deep Weibull model (DW-RNN) and multi-task learning (MTL-RNN). DW-RNN is able to learn the underlying failure dynamics by fitting Weibull distribution parameters using a deep neural network, learned with a survival likelihood, without training directly on each task. While DW-RNN makes an explicit assumption on the data distribution, MTL-RNN exploits the implicit relationship between the longterm RUL and short-term FP tasks to learn the underlying distribution. Additionally, both our methods can leverage the non-failed equipment data for RUL estimation. We demonstrate that our methods consistently outperform baseline RUL methods that can be used for FP while producing consistent results for RUL and FP. We also show that our methods perform at par with baselines trained on the objectives optimized for either of the two tasks.

show abstract

Predicting Disk Replacement towards Reliable Data Centers

Cited by 130 publications

References 9 publications

A review of deep learning with special emphasis on architectures, applications and recent trends

A review of deep learning with special emphasis on architectures, applications and recent trends

Mechanisms for Integrated Feature Normalization and Remaining Useful Life Estimation Using LSTMs Applied to Hard-Disks

Two Birds with One Network: Unifying Failure Event Prediction and Time-to-failure Modeling

Contact Info

Product

Resources

About