An Ensemble Method for Missing Data of Environmental Sensor Considering Univariate and Multivariate Characteristics

Choi, Chan-Young; Jung, Hae-Woong; Cho, Jaehyuk

doi:10.3390/s21227595

Cited by 6 publications

(4 citation statements)

References 49 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this study, imputation was performed from different angles, according to the de ned sensor imputation types [24]. We considered missing situations using case types that may occur in IoT-based environmental sensor modules and determined which node model affects the results.…”

Section: Missing Sensor Data Typementioning

confidence: 99%

Missing-value Imputation of Environment Sensors Using Multilayer Stacking with Scoring Method

Chun

Cho

2023

Preprint

Self Cite

View full text Add to dashboard Cite

While analyzing environmental data, missing data reduce the predictive power of the model, and imputing missing data to the algorithm biases the parameter estimation, thereby increasing the uncertainty of the results. This study introduces a missing data handling model based on multilayer stacking by synthesizing the characteristics of missing data handling. The model utilizes an ensemble technique to integrate the advantages of existing models, and the final meta learner of the ensemble model is augmented with data fusion data using a Kalman filter for training data and add the features of sensor fusion for multiple identical characteristics to the model. In a situation that includes the types of missing data used in the study, the method of generating new learning data by collecting weights based on the scoring method and weighting the existing learning data has the effect of matching the measurement environment. The performance of the model improved by 20% compared with existing models utilized as nodes in an environment where normal values and various types of defects were combined. In addition, the performance improved by 30% compared with the multiple imputation by chain equations (MICE) and the single center imputation from multiple chained equations (SICE) models, which are commonly used in other sensor data with defects within a sensor group, and stable results were obtained. This shows that the proposed model reduces the cost of determining the model for various errors that may occur in environmental sensors, and, by checking how sensitive the model is to different patterns of missing data, it can be applied in various environments and improved using advanced node models in the future.

show abstract

Section: Missing Sensor Data Typementioning

confidence: 99%

Missing-value Imputation of Environment Sensors Using Multilayer Stacking with Scoring Method

Chun

Cho

2023

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…These systems are necessary to monitor the manner in which alternative and sustainable energy sources affect air quality when deployed. Also relevant is the research which aims to fill the gaps caused by the inevitable malfunction of IoT systems and sensors [12], which results in missing data about air quality parameters and can skew data one way or the other. Such methods are important when reliability and consistency of sensor readings is of utmost importance.…”

Section: Introductionmentioning

confidence: 99%

Methods of Measuring Air Pollution in Cities and Correlation of Air Pollutant Concentrations

Bodić,

Rajs,

Vasiljević Toskić

et al. 2023

Processes

View full text Add to dashboard Cite

The monitoring of air quality continues to be one of the most important tasks when ensuring the safety of our environment. This paper aims to look at correlations between different types of pollutants, so that robust air quality measurement systems can be deployed in remote, inaccessible areas, at a reduced cost. The first matter at hand was to design an affordable and portable system capable of measuring different air pollutants. A custom PCB was designed that could support the acquisition of readings of, among others, particulate and CO sensors. Then, correlations between the concentrations of different pollutants were analyzed to identify if measuring the concentration of one type of pollutant can allow the extrapolation of the concentration of another. This particular study focuses on the correlations between the concentrations of particulate matter and CO. Finally, after observing a moderate correlation, it was proposed to measure the concentrations of pollutants that require less expensive sensors, and to extrapolate the concentrations of pollutants that require a more expensive sensor to measure their concentration. The link between particulate pollution and CO concentrations was identified and discussed as the result of this study.

show abstract

“…Other studies can be highlighted, especially related to medicine and health (Camargos et al, 2011;Carreras et al, 2021;Khan et al, 2021;Nunes, 2007;Payrovnaziri et al, 2021), air pollution (Choi et al, 2021;Ghazali et al, 2021;Pinto, 2013), engineering, mainly civil and traffic (Abdelgawad et al, 2015;Jiang et al, 2021), meteorology (Afrifa-Yamoah et al, 2020Bier and Ferraz, 2017;Costa et al, 2021;Ferrari and Ozaki, 2014;García-Peña et al, 2014), agriculture (Jiao et al, 2016;Nishina et al, 2017;Swenson, 2014), energy (Barbosa et al, 2018;Pelisson, 2021) and education (Vinha and Laros, 2018).…”

Section: Introductionmentioning

confidence: 99%

Methodological approaches for imputing missing data into monthly flows series

Bleidorn¹,

Pinto²,

Schmidt³

et al. 2022

Rev. ambiente água

View full text Add to dashboard Cite

Missing data is one of the main difficulties in working with fluviometric records. Database gaps may result from fluviometric stations components problems, monitoring interruptions and lack of observers. Incomplete series analysis generates uncertain results, negatively impacting water resources management. Thus, proper missing data consideration is very important to ensure better information quality. This work aims to analyze, comparatively, missing data imputation methodologies in monthly river-flow time series, considering, as a case study, the Doce River, located in Southeast Brazil. Missing data were simulated in 5%, 10%, 15%, 25% and 40% proportions following a random distribution pattern, ignoring the missing data generation mechanisms. Ten missing data imputation methodologies were used: arithmetic mean, median, simple and multiple linear regression, regional weighting, spline and Stineman interpolation, Kalman smoothing, multiple imputation and maximum likelihood. Their performances were compared through bias, root mean square error, absolute mean percentage error, determination coefficient and concordance index. Results indicate that for 5% missing data, any methodology for imputing can be considered, recommending caution for arithmetic mean method application. However, as the missing data proportion increases, it is recommended to use multiple imputation and maximum likelihood methodologies when there are support stations for imputation, and the Stineman interpolation and Kalman Smoothing methods when only the studied series is available. Keywords: Doce river, imputation, missing data.

show abstract

An Ensemble Method for Missing Data of Environmental Sensor Considering Univariate and Multivariate Characteristics

Cited by 6 publications

References 49 publications

Missing-value Imputation of Environment Sensors Using Multilayer Stacking with Scoring Method

Missing-value Imputation of Environment Sensors Using Multilayer Stacking with Scoring Method

Methods of Measuring Air Pollution in Cities and Correlation of Air Pollutant Concentrations

Methodological approaches for imputing missing data into monthly flows series

Contact Info

Product

Resources

About