A Spatio-Temporal Data Imputation Model for Supporting Analytics at the Edge

Kolomvatsos, Kostas; Papadopoulou, Panagiota; Anagnostopoulos, Christos; Hadjiefthymiades, Stathes

doi:10.1007/978-3-030-29374-1_12

Cited by 8 publications

(7 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…RMSE is particularly sensitive to outliers as it squares the difference between the predicted value and the observed value. RMSE presents error values in the same scale as the original variable [17] and it has been widely applied in time series analysis [18].…”

Section: Evaluation Metricmentioning

confidence: 99%

Short and Very Short Term Firm-level Load Forecasting for Warehouses: A Comparison of Machine Learning and Deep Learning Models

Ribeiro¹,

Carmo²,

Endo³

et al. 2022

Preprint

View full text Add to dashboard Cite

Commercial buildings are a significant consumer of energy worldwide. Logistics facilities, and specifically warehouses, are a common building type yet under-researched in the demand-side energy forecasting literature. Warehouses have an idiosyncratic profile when compared to other commercial and industrial buildings with a significant reliance on a small number of energy systems. As such, warehouse owners and operators are increasingly entering in to energy performance contracts with energy service companies (ESCOs) to minimise environmental impact, reduce costs, and improve competitiveness. ESCOs and warehouse owners and operators require accurate forecasts of their energy consumption so that precautionary and mitigation measures can be taken. This paper explores the performance of three machine learning models (Support Vector Regression (SVR), Random Forest, and Extreme Gradient Boosting (XGBoost)), three deep learning models (Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM), and Gated Recurrent Unit (GRU)), and a classical time series model, Autoregressive Integrated Moving Average (ARIMA) for predicting daily energy consumption. The dataset comprises 8,040 records generated over an 11-month period from January to November 2020 from a non-refrigerated logistics facility located in Ireland. The grid search method was used to identify the best configurations for each model. The proposed XGBoost models outperform other models for both very short load forecasting (VSTLF) and short term load forecasting (STLF); the ARIMA model performed the worst.

show abstract

Section: Evaluation Metricmentioning

confidence: 99%

Short and Very Short Term Firm-level Load Forecasting for Warehouses: A Comparison of Machine Learning and Deep Learning Models

Ribeiro¹,

Carmo²,

Endo³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Efficient missing value imputation (Patil et al, 2010) Technique is generalized and can be utilized for many data sets (Ishay and Herman, 2015) Impute missing values and build clusters as a unified integrated process (Abdallah and Shimshoni, 2016) K-means þ radial basis function (RBF) Faster convergence speed, higher stability, accuracy (Shi et al, 2018) Local least squares Local data clustering being incorporated for improved quality and efficiency (Keerin et al, 2013) Multiple kernel density Accuracy and efficiency (Liao et al, 2018) Rough set Handles the uncertainty and vagueness existing in data sets (Amiri and Jensen, 2016) Less computational complexity (Azam et al, 2018) Overcome the problem of crispness (Raja et al, 2019) (continued ) Shell neighbor Fills in an incomplete instance in a given data set by only using its left and right nearest neighbors with respect to each factor (attribute) and generalized to deal with data sets of mixed attributes (Zhang, 2011) Sliding window Applicable for IoT devices' data (Kolomvatsos et al, 2019) Soft cluster Overcomes the problems of inconsistency (Raja and Thangavel, 2016) Decision tree Branch-exclusive splits trees (BEST) A new classification procedure that can handle missing values by using data partitioning and better accuracy (Beaulac and Rosenthal, 2020) Boosted trees Able to handle missingness from data fusion, deterministic or distribution-free data sets (D'Ambrosio et al, 2012) C4.5 Generalized approach that uses index measure in the estimation of missing values (Madhu and Rajinikanth, 2012) Classification and regression trees (CART) A robust method to deal with different missing value types (Nikfalazar et al, 2020) Decision trees and forests A higher quality of imputation using similarity and correlations…”

Section: K-meansmentioning

confidence: 99%

A systematic review of machine learning-based missing value imputation techniques

Thomas

Rajabi

2021

DTA

View full text Add to dashboard Cite

PurposeThe primary aim of this study is to review the studies from different dimensions including type of methods, experimentation setup and evaluation metrics used in the novel approaches proposed for data imputation, particularly in the machine learning (ML) area. This ultimately provides an understanding about how well the proposed framework is evaluated and what type and ratio of missingness are addressed in the proposals. The review questions in this study are (1) what are the ML-based imputation methods studied and proposed during 2010–2020? (2) How the experimentation setup, characteristics of data sets and missingness are employed in these studies? (3) What metrics were used for the evaluation of imputation method?Design/methodology/approachThe review process went through the standard identification, screening and selection process. The initial search on electronic databases for missing value imputation (MVI) based on ML algorithms returned a large number of papers totaling at 2,883. Most of the papers at this stage were not exactly an MVI technique relevant to this study. The literature reviews are first scanned in the title for relevancy, and 306 literature reviews were identified as appropriate. Upon reviewing the abstract text, 151 literature reviews that are not eligible for this study are dropped. This resulted in 155 research papers suitable for full-text review. From this, 117 papers are used in assessment of the review questions.FindingsThis study shows that clustering- and instance-based algorithms are the most proposed MVI methods. Percentage of correct prediction (PCP) and root mean square error (RMSE) are most used evaluation metrics in these studies. For experimentation, majority of the studies sourced the data sets from publicly available data set repositories. A common approach is that the complete data set is set as baseline to evaluate the effectiveness of imputation on the test data sets with artificially induced missingness. The data set size and missingness ratio varied across the experimentations, while missing datatype and mechanism are pertaining to the capability of imputation. Computational expense is a concern, and experimentation using large data sets appears to be a challenge.Originality/valueIt is understood from the review that there is no single universal solution to missing data problem. Variants of ML approaches work well with the missingness based on the characteristics of the data set. Most of the methods reviewed lack generalization with regard to applicability. Another concern related to applicability is the complexity of the formulation and implementation of the algorithm. Imputations based on k-nearest neighbors (kNN) and clustering algorithms which are simple and easy to implement make it popular across various domains.

show abstract

“…In line with the former track, the work in [ 23 ] advances a double layered clustered scheme along with a consensus-based framework aimed at substituting missing values from the sensors measurements. In particular, the nodes located at the edge perform the data imputation.…”

Section: Related Workmentioning

confidence: 99%

“…Second, many existing experiments (see, e.g., [ 24 , 25 , 26 , 32 ]) are performed by simplistic data missing models, while we consider the problem of bursty missing values, which often arises when a sensor becomes unavailable for a certain (finite) period of time (a common situation for environmental sensors). Finally, compared to other works (see, e.g., [ 23 , 24 , 34 , 35 ]), which only account for one performance analysis (for instance, the imputation method accuracy), in our approach, we complement this analysis with a time assessment, which seems critical within real-time environmental data. Table 1 summarizes the main elements of novelty of our proposal with respect to some existing literature.…”

Section: Related Workmentioning

confidence: 99%

Embedded Data Imputation for Environmental Intelligent Sensing: A Case Study

Erhan

Mauro

Anjum

et al. 2021

Sensors

View full text Add to dashboard Cite

Recent developments in cloud computing and the Internet of Things have enabled smart environments, in terms of both monitoring and actuation. Unfortunately, this often results in unsustainable cloud-based solutions, whereby, in the interest of simplicity, a wealth of raw (unprocessed) data are pushed from sensor nodes to the cloud. Herein, we advocate the use of machine learning at sensor nodes to perform essential data-cleaning operations, to avoid the transmission of corrupted (often unusable) data to the cloud. Starting from a public pollution dataset, we investigate how two machine learning techniques (kNN and missForest) may be embedded on Raspberry Pi to perform data imputation, without impacting the data collection process. Our experimental results demonstrate the accuracy and computational efficiency of edge-learning methods for filling in missing data values in corrupted data series. We find that kNN and missForest correctly impute up to 40% of randomly distributed missing values, with a density distribution of values that is indistinguishable from the benchmark. We also show a trade-off analysis for the case of bursty missing values, with recoverable blocks of up to 100 samples. Computation times are shorter than sampling periods, allowing for data imputation at the edge in a timely manner.

show abstract

A Spatio-Temporal Data Imputation Model for Supporting Analytics at the Edge

Cited by 8 publications

References 26 publications

Short and Very Short Term Firm-level Load Forecasting for Warehouses: A Comparison of Machine Learning and Deep Learning Models

Short and Very Short Term Firm-level Load Forecasting for Warehouses: A Comparison of Machine Learning and Deep Learning Models

A systematic review of machine learning-based missing value imputation techniques

Embedded Data Imputation for Environmental Intelligent Sensing: A Case Study

Contact Info

Product

Resources

About