2015
DOI: 10.1136/bmjopen-2014-007450
|View full text |Cite
|
Sign up to set email alerts
|

Using decision trees to understand structure in missing data

Abstract: ObjectivesDemonstrate the application of decision trees—classification and regression trees (CARTs), and their cousins, boosted regression trees (BRTs)—to understand structure in missing data.SettingData taken from employees at 3 different industrial sites in Australia.Participants7915 observations were included.Materials and methodsThe approach was evaluated using an occupational health data set comprising results of questionnaires, medical tests and environmental monitoring. Statistical methods included stan… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

1
56
0
1

Year Published

2017
2017
2023
2023

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 46 publications
(63 citation statements)
references
References 28 publications
1
56
0
1
Order By: Relevance
“…Item non-response was addressed first by looking for missingness patterns using aggregation plots, matrix plots, and recursive partitioning. [14, 15] Using the MICE algorithm, multiple imputation was used to generate 50 complete datasets. [15] The imputation model included key variables associated with missingness as well as variables judged useful for imputation by the MICE algorithm using influx and outflux statistics.…”
Section: Methodsmentioning
confidence: 99%
“…Item non-response was addressed first by looking for missingness patterns using aggregation plots, matrix plots, and recursive partitioning. [14, 15] Using the MICE algorithm, multiple imputation was used to generate 50 complete datasets. [15] The imputation model included key variables associated with missingness as well as variables judged useful for imputation by the MICE algorithm using influx and outflux statistics.…”
Section: Methodsmentioning
confidence: 99%
“…The rpart program uses a native algorithm of “surrogate splits” to handle missing data in the predictor variables (when a value for a predictor variable is missing, and that variable needs to be used to determine a split, an alternative variable that is highly correlated with the missing variable is used to determine the direction of the split) [26]. Thus, we used the entire cohort for outcomes were recorded regardless of missing data among predictors, depending on the surrogate split function.…”
Section: Methodsmentioning
confidence: 99%
“…BRT have also been compared favourably with other flexible regression approaches such as generalised additive models [14]. An example of BRT models helping in developing an understanding of missingness structure in the data is given by [26]. In this study…”
Section: Discussionmentioning
confidence: 99%
“…Tierney [26] concluded that more knowledge was gained about the origins of the data and the data collection process, as well as the handling of missing values for future analysis. In another study [26], the author took a different approach to deal with missing values by taking summary values such as the mean over grouped data.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation