In the domain of predictive maintenance, when trying to repli- cate and compare research in remaining useful life estimation (RUL), several inconsistencies and errors were identified in the experimental methodology used by various researchers. This makes the replication and the comparison of results diffi- cult, thus severely hindering both progress in this research do- main and its practical application to industry. We survey the literature to evaluate the experimental procedures that were used, and identify the most common errors and omission in both experimental procedures and reporting.
A total of 70 papers on RUL were audited. From this meta- analysis we estimate that approximately 11% of the papers present work that will allow for replication and comparison. Surprisingly, only about 24.3% (17 of the 70 articles) com- pared their results with previous work. Of the remaining work, 41.4% generated and compared several models of their own and, somewhat unsettling, 31.4% of the researchers made no comparison whatsoever. The remaining 2.9% did not use the same data set for comparisons. The results of this study were also aggregated into 3 categories: problem class selec- tion, model fitting best practices and evaluation best practices. We conclude that model evaluation is the most problematic one.
The main contribution of the article is a proposal of an ex- perimental protocol and several recommendations that specif- ically target model evaluation. Adherence to this protocol should substantially facilitate the research and application of RUL prediction models. The goals are to promote the collab- oration between scholars and practitioners alike and advance the research in this domain.