Background: Random forests are becoming increasingly popular in many scientific fields because they can cope with "small n large p" problems, complex interactions and even highly correlated predictor variables. Their variable importance measures have recently been suggested as screening tools for, e.g., gene expression studies. However, these variable importance measures show a bias towards correlated predictor variables.
In linear mixed models, model selection frequently includes the selection of random effects. Two versions of the Akaike information criterion, aic, have been used, based either on the marginal or on the conditional distribution. We show that the marginal aic is not an asymptotically unbiased estimator of the Akaike information, and favours smaller models without random effects. For the conditional aic, we show that ignoring estimation uncertainty in the random effects covariance matrix, as is common practice, induces a bias that can lead to the selection of any random effect not predicted to be exactly zero. We derive an analytic representation of a corrected version of the conditional aic, which avoids the high computational cost and imprecision of available numerical approximations. An implementation in an R package (R Development Core Team, 2010) is provided. All theoretical results are illustrated in simulation studies, and their impact in practice is investigated in an analysis of childhood malnutrition in Zambia.
Generalized additive models for location, scale and shape (GAMLSSs) are a popular semiparametric modelling approach that, in contrast with conventional generalized additive models, regress not only the expected mean but also every distribution parameter (e.g. location, scale and shape) to a set of covariates. Current fitting procedures for GAMLSSs are infeasible for high dimensional data set-ups and require variable selection based on (potentially problematic) information criteria. The present work describes a boosting algorithm for high dimensional GAMLSSs that was developed to overcome these limitations. Specifically, the new algorithm was designed to allow the simultaneous estimation of predictor effects and variable selection. The algorithm proposed was applied to Munich rental guide data, which are used by landlords and tenants as a reference for the average rent of a flat depending on its characteristics and spatial features. The net rent predictions that resulted from the high dimensional GAMLSSs were found to be highly competitive and covariate-specific prediction intervals showed a major improvement over classical generalized additive models.
Structured additive regression provides a general framework for complexGaussian and non-Gaussian regression models, with predictors comprising arbitrary combinations of nonlinear functions and surfaces, spatial effects, varying coefficients, random effects and further regression terms. The large flexibility of structured additive regression makes function selection a challenging and important task, aiming at (1) selecting the relevant covariates, (2) choosing an appropriate and parsimonious representation of the impact of covariates on the predictor and (3) determining the required interactions. We propose a spike-and-slab prior structure for function selection that allows to include or exclude single coefficients as well as blocks of coefficients representing specific model terms. A novel multiplicative parameter expansion is required to obtain good mixing and convergence properties in a Markov chain Monte Carlo simulation approach and is shown to induce desirable shrinkage properties. In simulation studies and with (real) benchmark classification data, we investigate sensitivity to hyperparameter settings and compare performance to competitors. The flexibility and applicability of our approach are demonstrated in an additive piecewise exponential model with time-varying effects for right-censored survival times of intensive care patients with sepsis.Geoadditive and additive mixed logit model applications are discussed in an extensive appendix.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.