2009
DOI: 10.1080/02827580902870490
|View full text |Cite
|
Sign up to set email alerts
|

The roles of nearest neighbor methods in imputing missing data in forest inventory and monitoring databases

Abstract: Almost universally, forest inventory and monitoring databases are incomplete, ranging from missing data for only a few records and a few variables, common for small land areas, to missing data for many observations and many variables, common for large land areas. For a wide variety of applications, nearest neighbor (NN) imputation methods have been developed to fill in observations of variables that are missing on some records (Y-variables), using related variables that are available for all records (X-variabl… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
94
0
1

Year Published

2011
2011
2019
2019

Publication Types

Select...
4
2
2

Relationship

1
7

Authors

Journals

citations
Cited by 141 publications
(96 citation statements)
references
References 51 publications
1
94
0
1
Order By: Relevance
“…We used nearest neighbour methods [11] to impute the size class distributions for the target prediction units. The distance measure to identify nearest neighbours was calculated using randomForest and the predictions are referred to as randomForest Nearest Neighbour (RFNN).…”
Section: Non Parametric or Randomforest Nearest Neighbour Predictionmentioning
confidence: 99%
See 1 more Smart Citation
“…We used nearest neighbour methods [11] to impute the size class distributions for the target prediction units. The distance measure to identify nearest neighbours was calculated using randomForest and the predictions are referred to as randomForest Nearest Neighbour (RFNN).…”
Section: Non Parametric or Randomforest Nearest Neighbour Predictionmentioning
confidence: 99%
“…Imputation is used to associate expensive but sparse data with inexpensive and spatially comprehensive data [11]. The response variable is measured on a subset of the prediction units in the population (the reference data set), and auxiliary or predictor variables are available for the entire population.…”
Section: Introductionmentioning
confidence: 99%
“…Model-based imputation methods use other variables in the dataset to impute missing data, but they substantially alter the univariate trait distributions and the covariance structure of the dataset (Gelman and Hill, 2007). Approaches such as k nearest neighbour (kNN) or machine-learning methods (Stekhoven and Bühlmann, 2012) may be more appropriate to impute multivariate datasets, preserving their covariance structure (Eskelson et al, 2009;Penone et al, 2014). In a multiple imputation framework, m imputed datasets are obtained through simulation and may be jointly analysed to provide parameter estimates that take into account the uncertainty introduced by the imputations themselves (e.g.…”
Section: Introductionmentioning
confidence: 99%
“…While forest inventories have adopted statistical imputation methods for some time, as for example the kNN methods (Eskelson et al, 2009, and references therein), imputation methods have only recently started to be used in traitbased ecology (Baraloto et al, 2010;Pyšek et al, 2015).…”
Section: Introductionmentioning
confidence: 99%
“…More sophisticated statistical modeling and machine-learning techniques have been developed in statistics and computer sciences, and tested and applied to tackle missing data problems in many fields such as industrial engineering (Lakshminarayan et al, 1999) and forestry (Eskelson et al, 2009) etc. Waddell (2009 suggests that k-nearest neighbors and support vector machines may be two promising techniques for imputing missing land use data, particularly parcel-level data.…”
Section: Introductionmentioning
confidence: 99%