2018
DOI: 10.29115/sp-2018-0003
|View full text |Cite
|
Sign up to set email alerts
|

Surveying the Forests and Sampling the Trees: An overview of Classification and Regression Trees and Random Forests with applications in Survey Research

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
21
0
1

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 24 publications
(22 citation statements)
references
References 14 publications
0
21
0
1
Order By: Relevance
“…5 We use a supervised learning algorithm (random forests) to perform variable selection allowing for all possible interactions between the covariates in equation 1. Random forests is an ensemble learning technique (nonparametric) for regression that captures complex interactions and nonlinear structures in the data by using multiple decision trees that are grown from independent bootstrapped samples from the training data (see Breiman 2001;Buskirk 2018;Buskirk et al 2018). The algorithm grows independent decision trees (weak learners) from each bootstrapped sample from the training data then combines all the weak learners into a single strong learner by averaging across all of them.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…5 We use a supervised learning algorithm (random forests) to perform variable selection allowing for all possible interactions between the covariates in equation 1. Random forests is an ensemble learning technique (nonparametric) for regression that captures complex interactions and nonlinear structures in the data by using multiple decision trees that are grown from independent bootstrapped samples from the training data (see Breiman 2001;Buskirk 2018;Buskirk et al 2018). The algorithm grows independent decision trees (weak learners) from each bootstrapped sample from the training data then combines all the weak learners into a single strong learner by averaging across all of them.…”
Section: Methodsmentioning
confidence: 99%
“…This technique allows us to identify complex patterns in the data that could not be identified using conventional empirical methods (e.g., ordinary least squares [OLS] or logistic regression), and it provides with valuable information about the performance of each covariate (Hastie et al 2009). The main advantages of using random forests as a variable selection technique are that it reduces over-fitting as it aggregates over multiple trees and reduces bias when trees are grown deep enough (see Breiman 2001; Buskirk 2018; Hastie et al 2009). The analysis is divided into two steps.…”
Section: Methodsmentioning
confidence: 99%
“…Can handle outliers and missing data [ 89 ] 2. Computationally fast [ 90 ] Models are based on splits that depend on previous splits; an error made in a higher split will propagate down [ 90 ] Users need to pre-specify dependent (or target) variables Abbreviations: CHAID Chi-square Automatic Interaction Detector, CART Classification and Regression Tree # Some studies applied multiple methods in tandem or in combination …”
Section: Resultsmentioning
confidence: 99%
“…We estimate the tunning parameter using 10-fold cross-validation of the training data along with a one-standard error rule and run a random forest to predict the outcome using 100 classification trees Buskirk (2018).…”
Section: Imprfmentioning
confidence: 99%