Dealing with missing data as it pertains of e-maintenance

Loukopoulos, Panagiotis; Zolkiewski, George; Bennett, I.J.; Pilidis, Pericles; Duan, Fang; David, Éric

doi:10.1108/jqme-08-2016-0032

Cited by 7 publications

(6 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Although extensive work has been carried out under ML for CBM, yet little attention has been paid to the data preparation phase. According to (Bennane and Yacout, 2010;Loukopoulos et al, 2017;Diez-Olivan et al, 2019), the relevance of the data preparation phase has been widely recognized in the literature but still few research efforts have been carried out to address this issue in CBM context.…”

Section: Literature Reviewmentioning

confidence: 99%

“…Data were cleaned using the Logical Analysis of Data (LAD) model; then a supervised learning algorithm was used to predict the health state of an oil transformer system. Loukopoulos et al (2017) have also presented different imputation techniques to handle the missing data, for the CBM application on centrifugal compressors. Among these techniques, autoregressive model, k-NN imputation, Self Organizing Map (SOM) and Bayesian Principal Components Analysis (BPCA) were used to fill the missing data.…”

Section: Literature Reviewmentioning

confidence: 99%

“…The neighborhood size k selection plays an important role in resulting in a good performance of k-NN. However, as pointed out by (Loukopoulos et al, 2017), no global rule is set for determining this optimal k. In the present paper, preliminary experiments as the one done by (Thanh Noi and Kappas, 2018), with different values of k between 1 and 20, are conducted. Then the k value which gave the lowest value of MAPE is selected Data normalization consists of scaling the features so they can fall within a smaller range, improving the efficiency and the accuracy of ML algorithms, (Han et al, 2011).…”

Section: Data Preparation Techniquesmentioning

confidence: 99%

“…However, as pointed out in (Bennane and Yacout, 2010;Loukopoulos et al, 2017;Diez-Olivan et al, 2019), the majority of these works have centered in comparing performances of different ML algorithms in degradation prediction; however, they did not give enough insight on the data preparation phase. Only a few works, such as (Bukhsh et al, 2020) have exhibited the data preparation technique used before applying predictive ML models for bridges intelligent maintenance.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Data Preparation in Machine Learning for Condition-based Maintenance

Masmoudi¹,

Jaoua²,

Jaoua³

et al. 2021

Journal of Computer Science

View full text Add to dashboard Cite

Using Machine Learning (ML) prediction to achieve a successful, cost-effective, Condition-Based Maintenance (CBM) strategy has become very attractive in the context of Industry 4.0. In other fields, it is well known that in order to benefit from the prediction capability of ML algorithms, the data preparation phase must be well conducted. Thus, the objective of this paper is to investigate the effect of data preparation on the ML prediction accuracy of Gas Turbines (GTs) performance decay. First a data cleaning technique for robust Linear Regression imputation is proposed based on the Mixed Integer Linear Programming. Then, experiments are conducted to compare the effect of commonly used data cleaning, normalization and reduction techniques on the ML prediction accuracy. Results revealed that the best prediction accuracy of GTs decay, found with the k-Nearest Neighbors ML algorithm, considerately deteriorate when changing the data preparation steps and/or techniques. This study has shown that, for effective CBM application in industry, there is a need to develop a systematic methodology for design and selection of adequate data preparation steps and techniques with the proposed ML algorithms.

show abstract

Section: Literature Reviewmentioning

confidence: 99%

Section: Literature Reviewmentioning

confidence: 99%

Section: Data Preparation Techniquesmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Data Preparation in Machine Learning for Condition-based Maintenance

Masmoudi¹,

Jaoua²,

Jaoua³

et al. 2021

Journal of Computer Science

View full text Add to dashboard Cite

show abstract

“…It has been shown that different PCA based methods perform similar with regards to accuracy [14]. Another work introduces an ad hoc category which includes mean, median and last observation carried forward (LOF) [13]. Among those methods, kNN is accurate and efficient [8], [11].…”

Section: Background and Related Workmentioning

confidence: 99%

An improved k-nearest neighbours method for traffic time series imputation

Sun

Cheng

et al. 2017

2017 Chinese Automation Congress (CAC)

View full text Add to dashboard Cite

Abstract-Intelligent transportation systems (ITS) are becoming more and more effective, benefiting from big data. Despite this, missing data is a problem that prevents many prediction algorithms in ITS from working effectively. Much work has been done to impute those missing data. Among different imputation methods, k-nearest neighbours (kNN) has shown excellent accuracy and efficiency. However, the general kNN is designed for matrix instead of time series so it lacks the usage of time series characteristics such as windows and weights that are gap-sensitive. This work introduces gap-sensitive windowed kNN (GSW-kNN) imputation for time series. The results show that GSW-kNN is 34% more accurate than benchmarking methods, and it is still robust even if the missing ratio increases to 90%.

show abstract