Variable Selection for Gaussian Process Models in Computer Experiments

Linkletter, Crystal D.; Bingham, Derek; Hengartner, Nicolas W.; Higdon, David; Ye, Kenny

doi:10.1198/004017006000000228

Cited by 140 publications

(128 citation statements)

References 14 publications

Supporting

Mentioning

128

Contrasting

Order By: Relevance

“…Although in this article we assumed that all putative QTL are located at the marker positions, it is straightforward to extend the method to consider any candidate QTL in between marker positions as in Wang et al (2005) and Huang et al (2010). A similar nonparametric variable selection procedure has been proposed for computer experiments by Linkletter et al (2006). These authors mainly focused on identifying active factors having nonlinear relationships with the response variable.…”

Section: Discussionmentioning

confidence: 99%

“…However, mapping multiple interacting QTL is our main purpose, and our article appears to be the first one to propose modeling the joint action of multiple QTL with an unknown function having a Gaussian process prior, which accommodates any multiway interactions. Moreover, Linkletter et al (2006) consider only a relatively small (,50) number of continuous covariates while in our article and in QTL linkage and association mapping in general, there are a large number of discrete marker covariates (hundreds or thousands) in addition to a small number of environmental, continuous covariates or discrete factors. Therefore, an efficient sampling scheme, such as the hybrid MCMC described in this article, is essential for dealing with these large-scale data sets.…”

Section: Discussionmentioning

confidence: 99%

“…Wu et al (2007) proposed a similar idea for variable selection in linear regression models using a set of pseudonull variables. Their method requires no additional repeated analysis as in Linkletter et al (2006) and can also incorporate the linkage structure of the observed markers into the generation of the pseudonull variables. We are planning to extend the method of Wu et al (2007) to our Gaussian process-based QTL selection methodology.…”

Section: Discussionmentioning

confidence: 99%

“…Alternatively, we may add pseudonull variable(s) into the model and use the posterior distribution of their g's to guide the variable selection. Linkletter et al (2006) suggested adding a single pseudonull variable but running the analysis many times (say 100). For computational reasons, this approach works for their smaller size problems but is computationally very demanding or infeasible in the QTL mapping context.…”

Section: Discussionmentioning

confidence: 99%

See 3 more Smart Citations

Nonparametric Bayesian Variable Selection With Applications to Multiple Quantitative Trait Loci Mapping With Epistasis and Gene–Environment Interaction

et al. 2010

View full text Add to dashboard Cite

The joint action of multiple genes is an important source of variation for complex traits and human diseases. However, mapping genes with epistatic effects and gene-environment interactions is a difficult problem because of relatively small sample sizes and very large parameter spaces for quantitative trait locus models that include such interactions. Here we present a nonparametric Bayesian method to map multiple quantitative trait loci (QTL) by considering epistatic and gene-environment interactions. The proposed method is not restricted to pairwise interactions among genes, as is typically done in parametric QTL analysis. Rather than modeling each main and interaction term explicitly, our nonparametric Bayesian method measures the importance of each QTL, irrespective of whether it is mostly due to a main effect or due to some interaction effect(s), via an unspecified function of the genotypes at all candidate QTL. A Gaussian process prior is assigned to this unknown function. In addition to the candidate QTL, nongenetic factors and covariates, such as age, gender, and environmental conditions, can also be included in the unspecified function. The importance of each genetic factor (QTL) and each nongenetic factor/covariate included in the function is estimated by a single hyperparameter, which enters the covariance function and captures any main or interaction effect associated with a given factor/covariate. An initial evaluation of the performance of the proposed method is obtained via analysis of simulated and real data. T RAITS showing continuous variation are calledquantitative traits and are typically controlled by multiple genetic and nongenetic factors, which tend to have relatively small effects individually. Crosses between inbred lines produce suitable populations for quantitative trait locus (QTL) mapping and are available for agricultural plants and for animal (e.g., mouse) models of human diseases. Such crosses are often used to detect QTL. For these inbred line crosses, uniform genetic backgrounds, controlled breeding schemes, and controlled environment ensure that there is little or no confounding of uncontrolled sources of variability with genetic effects. The potential for such confounding complicates and limits the analysis and interpretation of human data. Because of the homology between humans and rodents, rodent models can be extremely useful in advancing our understanding of certain human diseases. In the past 2 decades, various statistical approaches have been developed to identify QTL in inbred line crosses (see, for example, Doerge et al. 1997 for review). To perform QTL mapping (identification), a large number of candidate positions (candidate QTL) along the genome are selected. These candidate QTL may all be located at genetic markers (positions of sequence variants in the genome where the genotypes of all individuals in a mapping population can be measured) or also in between markers if the marker density is not high. QTL mapping may then be performed by considering one cand...

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Discussionmentioning

confidence: 99%

Section: Discussionmentioning

confidence: 99%

Section: Discussionmentioning

confidence: 99%

See 2 more Smart Citations

Nonparametric Bayesian Variable Selection With Applications to Multiple Quantitative Trait Loci Mapping With Epistasis and Gene–Environment Interaction

et al. 2010

View full text Add to dashboard Cite

show abstract

“…It can also be helpful to determine the relative importance of each input variable to find out which of them have crucial influence on the system being studied (Linkletter et al, 2006). Variable selection can help in three key aspects: improving the performance of the predictors, providing more time-efficient and cost-effective predictors, and providing a better understanding of the underlying data-generating processes.…”

Section: Introductionmentioning

confidence: 99%

Variable selection via a multi-stage strategy

Chang

Lee

2014

Journal of Applied Statistics

View full text Add to dashboard Cite

Variable selection for nonlinear regression is a complex problem, made even more difficult when there are a large number of potential covariates and a limited number of datapoints. We propose herein a multi-stage method that combines state of the art techniques at each stage to best discover the relevant variables. At the first stage, an extension of the Bayesian Additive Regression tree is adopted to reduce the total number of variables to around 30. At the second stage, sensitivity analysis in Treed Gaussian Process is adopted to further reduce the total number of variables. Two stopping rules are designed and sequential design is adopted to make best use of previous information. We demonstrate our approach on two simulated examples and one real dataset.

show abstract

Sequential design for achieving estimated accuracy of global sensitivities

Guenther

Lee

Gray

2014

Appl Stoch Models Bus & Ind

View full text Add to dashboard Cite

Global sensitivity analysis provides information on the relative importance of the input variables for simulator functions used in computer experiments. It is more conclusive than screening methods for determining if a variable is influential, especially if a variable's influence is derived from its interactions with other variables. In this paper, we develop a method for providing global sensitivities with estimated accuracy. A treed Gaussian process serves as a statistical emulator of the black box function. A sequential experimental design makes effective and efficient use of simulator evaluations by adaptively sampling points that are expected to provide the maximum improvement to the emulator model. The method accounts for both sampling error and emulator error.

show abstract

Variable Selection for Gaussian Process Models in Computer Experiments

Cited by 140 publications

References 14 publications

Nonparametric Bayesian Variable Selection With Applications to Multiple Quantitative Trait Loci Mapping With Epistasis and Gene–Environment Interaction

Nonparametric Bayesian Variable Selection With Applications to Multiple Quantitative Trait Loci Mapping With Epistasis and Gene–Environment Interaction

Variable selection via a multi-stage strategy

Sequential design for achieving estimated accuracy of global sensitivities

Contact Info

Product

Resources

About