Thresholding variable plays a crucial role in subgroup identification for personalized medicine. Most existing partitioning methods split the sample based on one predictor variable. In this paper, we consider setting the splitting rule from a combination of multivariate predictors, such as the latent factors, principle components, and weighted sum of predictors. Such a subgrouping method may lead to more meaningful partitioning of the population than using a single variable. In addition, our method is based on a change point regression model and thus yields straight forward model-based prediction results. After choosing a particular thresholding variable form, we apply a two-stage multiple change point detection method to determine the subgroups and estimate the regression parameters. We show that our approach can produce two or more subgroups from the multiple change points and identify the true grouping with high probability. In addition, our estimation results enjoy oracle properties. We design a simulation study to compare performances of our proposed and existing methods and apply them to analyze data sets from a Scleroderma trial and a breast cancer study.
Mendelian randomization is a technique used to examine the causal effect of a modifiable exposure on a trait using an observational study by utilizing genetic variants. The use of many instruments can help to improve the estimation precision but may suffer bias when the instruments are weakly associated with the exposure. To overcome the difficulty of high‐dimensionality, we propose a model average estimator which involves using different subsets of instruments (single nucleotide polymorphisms, SNPs) to predict the exposure in the first stage, followed by weighting the submodels' predictions using penalization by common penalty functions such as least absolute shrinkage and selection operator (LASSO), smoothly clipped absolute deviation (SCAD) and minimax concave penalty (MCP). The model averaged predictions are then used as a genetically predicted exposure to obtain the estimation of the causal effect on the response in the second stage. The novelty of our model average estimator also lies in that it allows the number of submodels and the submodels' sizes to grow with the sample size. The practical performance of the estimator is examined in a series of numerical studies. We apply the proposed method on a real genetic dataset investigating the relationship between stature and blood pressure.
Summary
We present a novel model averaging method to construct a prediction function in semi‐parametric form. The weighted sum of candidate semi‐parametric models is taken as a prediction of the mean response. Marginal non‐parametric regression models are approximated by spline basis functions and we apply a Bayesian Monte Carlo approach to fit such models. The optimal model weight parameters are estimated by minimising the least squares criterion with an explicit form. We implement our method in extensive simulation studies and illustrate its use with two real medical data examples. Our methods are demonstrated to be more accurate than both classical parametric model averaging methods and existing semi‐parametric regression models.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.