Despite the high accuracy showcased by some studies in predicting hard disk failure using decision tree in the classification process, the accuracy of the decision tree is in question due to its vast difference among other algorithms ranging from 17.7% to 60.92%. This paper confirms the claim of some studies about the overfitting of decision tree when used with a large amount of data, real-valued and numeric attributes like SMART at-tributes. Utilizing ANFIS algorithm in predicting imminent hard disk failure surpasses other algorithms (CHAID, C&R Tree, Neural Network, MLR, and SVM) by 4.2% while keeping its distance from the very high percentage (99.58) over-fitted decision tree at 86.08%. The ANFIS also predicted the failure 5 days before it actually happens. Keywords: ANFIS, Predicting Hard Disk Failure, SMART Attributes, Data Center, Imminent Disk Failure
I. INTRODUCTIONHaving files rested and backed up on a data center or a cloud storage platform ensures continuous operation, data protection and recovery [1]. It is the responsibility of the data center to do all their necessary plans and procedures to be so. Critical factors such as experience, financial stability, security, support and physical infrastructure need to be maintained to have a reliable data center [2]. In the Philippines, 80% of business enterprises experienced data loss costing them around $8 billion worth of data loss [3]. On a larger scale, companies around the world experienced an average data loss of 400% in just two years accumulating a total amount loss of $1.7 trillion, 30% of which came from cloud storage. Admittedly, 51% of surveyed companies have no disaster recovery plan [4]. Study shows that the leading cause of data loss is the hardware failure, human error, software corruption, computer viruses and natural disaster among which hardware failure rank first at 57% [5]. Recent studies conducted by Data Barracks shows that hardware failure is still one of the topmost cause of data loss at 25%, being human error as the leader at 29% [6]. With the help of machine learning, failure can be preempted thereby avoiding data loss before it happens. There are some researches delving into this problem [7][8] [9][10][11] [12]. Each research presented varying results. Among these different researches and algorithms used, decision tree came out as the most accurate. Similarly, Suchatpong and Bhumkittipich's study com-pared decision tree with the neural network, SVM, CHAID and C&R Tree. Decision tree ranks first at 99.58% while neural network is at 56.09% accurate, SVM is at 38.66%, CHAID at 50.42% and C&R tree at 56.93%, which are way lower that the decision tree. While the result of the decision tree is promising, there is an observable irregularity in the result. The decision tree is too accurate compare to other algorithm in predicting hard drive failure. IBM states that if an algorithm has 98% accuracy while other techniques tried has 60% accuracy, it is most probably overfitting, which is exactly the case [13]. Initial investigatio...