2014
DOI: 10.1214/14-aoas755
|View full text |Cite
|
Sign up to set email alerts
|

Variable selection for BART: An application to gene regulation

Abstract: We consider the task of discovering gene regulatory networks, which are defined as sets of genes and the corresponding transcription factors which regulate their expression levels. This can be viewed as a variable selection problem, potentially with high dimensionality. Variable selection is especially challenging in high-dimensional settings, where it is difficult to detect subtle individual effects and interactions between predictors. Bayesian Additive Regression Trees [BART, Ann. Appl. Stat. 4 (2010) 266-29… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
164
0

Year Published

2015
2015
2023
2023

Publication Types

Select...
8
2

Relationship

1
9

Authors

Journals

citations
Cited by 120 publications
(166 citation statements)
references
References 39 publications
2
164
0
Order By: Relevance
“…To establish estimates of what variables matter and to gain a (rough) sense of the structure of each BART model, we calculate each variable's inclusion proportion. This quantity measures the number of times a specific variable was used in BART's tree models, divided by the total number of variables used in all of the BART's tree models (Chipman et al, 2010; Kaplener & Bleich, 2014) The more often a variable is used in predicting the response, the higher its inclusion proportion will be. Plots of the top 20 variables by inclusion proportion (for each model) are included in the supplemental materials. Note that, for the Sample I and Sample II self-rated health and depression models, social-psychological features (e.g.…”
mentioning
confidence: 99%
“…To establish estimates of what variables matter and to gain a (rough) sense of the structure of each BART model, we calculate each variable's inclusion proportion. This quantity measures the number of times a specific variable was used in BART's tree models, divided by the total number of variables used in all of the BART's tree models (Chipman et al, 2010; Kaplener & Bleich, 2014) The more often a variable is used in predicting the response, the higher its inclusion proportion will be. Plots of the top 20 variables by inclusion proportion (for each model) are included in the supplemental materials. Note that, for the Sample I and Sample II self-rated health and depression models, social-psychological features (e.g.…”
mentioning
confidence: 99%
“…Bleich et al (2014) discuss various ways of choosing appropriate thresholds for the variable inclusion proportions to use BART for variable selection such as the “local” and “global max threshold” options. Both of these involve permuting the response variable and running the BART model a number of times.…”
Section: Resultsmentioning
confidence: 99%
“…Additional variable selection methods using the average use per splitting rule are also available in and described in Bleich et al . . These include using permutation sampling to determine an appropriate threshold for the average use per splitting rule based on a null distribution, which would help identify which variables are truly important.…”
Section: Regressionmentioning
confidence: 99%