Novel imputation for time series data

Chang, C. W. David; Wang, Cheng-Ru; Lee, Shie-Jue

doi:10.1109/icmlc.2015.7340675

Cited by 3 publications

(3 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Increased accuracy and handles missing values randomly (Khotimah et al, 2019) (continued ) (Lee and Kim, 2018) Utilize the kernel partial least squares in handling and classifying missing data (Gao et al, 2013) Imputes the missing data utilizing the mode's historical data and its neighbor nodes current data jointly (Chang et al, 2015) Regression tree Improving the imputation accuracy in a sparse environment (Higashijima et al, 2010) Sample based Superior performance even when absent ratio is relatively intensive (Gao et al, 2015) Support vector regression (SVR) Can be easily adapted for other platforms of gene subsets (Bayrak and Ogul, 2017) (continued )…”

Section: )mentioning

confidence: 99%

A systematic review of machine learning-based missing value imputation techniques

Thomas

Rajabi

2021

DTA

View full text Add to dashboard Cite

PurposeThe primary aim of this study is to review the studies from different dimensions including type of methods, experimentation setup and evaluation metrics used in the novel approaches proposed for data imputation, particularly in the machine learning (ML) area. This ultimately provides an understanding about how well the proposed framework is evaluated and what type and ratio of missingness are addressed in the proposals. The review questions in this study are (1) what are the ML-based imputation methods studied and proposed during 2010–2020? (2) How the experimentation setup, characteristics of data sets and missingness are employed in these studies? (3) What metrics were used for the evaluation of imputation method?Design/methodology/approachThe review process went through the standard identification, screening and selection process. The initial search on electronic databases for missing value imputation (MVI) based on ML algorithms returned a large number of papers totaling at 2,883. Most of the papers at this stage were not exactly an MVI technique relevant to this study. The literature reviews are first scanned in the title for relevancy, and 306 literature reviews were identified as appropriate. Upon reviewing the abstract text, 151 literature reviews that are not eligible for this study are dropped. This resulted in 155 research papers suitable for full-text review. From this, 117 papers are used in assessment of the review questions.FindingsThis study shows that clustering- and instance-based algorithms are the most proposed MVI methods. Percentage of correct prediction (PCP) and root mean square error (RMSE) are most used evaluation metrics in these studies. For experimentation, majority of the studies sourced the data sets from publicly available data set repositories. A common approach is that the complete data set is set as baseline to evaluate the effectiveness of imputation on the test data sets with artificially induced missingness. The data set size and missingness ratio varied across the experimentations, while missing datatype and mechanism are pertaining to the capability of imputation. Computational expense is a concern, and experimentation using large data sets appears to be a challenge.Originality/valueIt is understood from the review that there is no single universal solution to missing data problem. Variants of ML approaches work well with the missingness based on the characteristics of the data set. Most of the methods reviewed lack generalization with regard to applicability. Another concern related to applicability is the complexity of the formulation and implementation of the algorithm. Imputations based on k-nearest neighbors (kNN) and clustering algorithms which are simple and easy to implement make it popular across various domains.

show abstract

Section: )mentioning

confidence: 99%

A systematic review of machine learning-based missing value imputation techniques

Thomas

Rajabi

2021

DTA

View full text Add to dashboard Cite

show abstract

“…Next, we will experiment by generating random NA values in the previous time series and calculate the NA values by applying the average of the nearest neighbors (previous and next) with LANN algorithm according equation (1).…”

Section: A Local Average Of Nearest Neighbors (Lann)mentioning

confidence: 99%

“…Time series data are used in a large variety of real-world applications, and they often encounter the missing value problem due to data transmisión errors, machine malfunction, or human errors [1]. While imputation in general is a wellknown problem and widely covered by different tools, finding algorithms or techniques able to fill missing values in univariate time series is more complicated [2].…”

Section: Introductionmentioning

confidence: 99%

Local Average of Nearest Neighbors: Univariate Time Series Imputation

Flores¹,

Tito²,

Silva³

2019

IJACSA

View full text Add to dashboard Cite

The imputation of time series is one of the most important tasks in the homogenization process, the quality and precision of this process will directly influence the accuracy of the time series predictions. This paper proposes two simple algorithms, but quite powerful for univariate time series imputation process, which are based on the means of the nearest neighbors for the imputation of missing data. The first of them Local Average of Neighbors Neighbors (LANN) calculates the missing value from the average of the previous neighbor and the following neighbor to the missing value. The second Local Average of Neighbors Neighbors+ (LANN+), considers a threshold parameter, which allows to differentiate the calculation of the missing values according to the difference between the neighbors: for the differences less than or equal to the threshold the missing value is calculated through of LANN and for major differences the missing value is calculated from the average of the four closest neighbors, two previous and two subsequent to the missing value. Imputation results on different time series are promising.

show abstract