2021
DOI: 10.3390/ijerph18031333
|View full text |Cite
|
Sign up to set email alerts
|

Handling Complex Missing Data Using Random Forest Approach for an Air Quality Monitoring Dataset: A Case Study of Kuwait Environmental Data (2012 to 2018)

Abstract: In environmental research, missing data are often a challenge for statistical modeling. This paper addressed some advanced techniques to deal with missing values in a data set measuring air quality using a multiple imputation (MI) approach. MCAR, MAR, and NMAR missing data techniques are applied to the data set. Five missing data levels are considered: 5%, 10%, 20%, 30%, and 40%. The imputation method used in this paper is an iterative imputation method, missForest, which is related to the random forest approa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
35
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
2

Relationship

1
7

Authors

Journals

citations
Cited by 50 publications
(35 citation statements)
references
References 50 publications
0
35
0
Order By: Relevance
“…In the age and income example above, assume that there are missed values only of income. If the missed values are observed in a specific income range, the missed values type is MNAR [3,11]. Some types of missing data adversely affect the analysis more than other types.…”
Section: Introductionmentioning
confidence: 99%
“…In the age and income example above, assume that there are missed values only of income. If the missed values are observed in a specific income range, the missed values type is MNAR [3,11]. Some types of missing data adversely affect the analysis more than other types.…”
Section: Introductionmentioning
confidence: 99%
“…In this dataset, time and location are as observed variables. The probability of missing an observation in pollutant concentration variable is independent of other observations but dependent on time and location variables (15)(16)(17). Thus, we can conclude that the missing mechanism of PM 10 and O 3 concentrations would be MAR.…”
Section: Identifying the Missing Mechanismmentioning
confidence: 88%
“…Determination of whether data is MAR or MNAR is often difficult as there is no reliable technique to do so. But, in some clinical or environmental studies, 4,14,46,47 Additionally, the utilization of MI in longitudinal designs with layered data may present challenges that may need the use of MI algorithms or other approaches other than MI. [51][52][53] Another challenge is that statistical packages vary with their ease of usability in respect to the merging variables and test statistics.…”
Section: Discussionmentioning
confidence: 99%
“…To overcome this problem, Rubin suggested the theory of multiple imputation, in which missing values are imputed using the appropriate model a few times (generally 3-5 times) and a standard method is applied for the analysis. 4,9,10,11 The imputation method provides more accurate results, but problems with the application of imputation include: (a) maximum use of the available data to reduce the error for univariate data and preserve covariance in multivariate data sets; and (b) reporting the variance estimates of uncertainty caused due to the imputed value. 11 Several parametric and non-parametric techniques have been employed to deal with missing values.…”
Section: Missing Imputation -Rubin's Approachmentioning
confidence: 99%
See 1 more Smart Citation