We consider the problem of accounting for model uncertainty in linear regression models. Conditioning on a single selected model ignores model uncertainty, and thus leads to the underestimation of uncertainty when making inferences about quantities of interest. A Bayesian solution to this problem involves averaging over all possible models i.e., combinations of predictors when making inferences about quantities of interest. This approach is often not practical. In this paper we o er two alternative approaches. First we describe an ad hoc procedure called Occam's Window" which indicates a small set of models over which a model average can be computed. Second, we describe a Markov c hain Monte Carlo approach which directly approximates the exact solution. In the presence of model uncertainty, both these model averaging procedures provide better predictive performance than any single model which might reasonably have been selected.In the extreme case where there are many candidate predictors but no relationship between any of them and the response, standard variable selection procedures often choose some subset of variables that yields a high R 2 and a highly signi cant o verall F value. In this situation, Occam's Window usually indicates the null model as the only one to be considered, or else a small number of models including the null model, thus largely resolving the problem of selecting signi cant models when there is no signal in the data.Software to implement our methods is available from StatLib.
We consider the problem of accounting for model uncertainty in linear regression models. Conditioning on a single selected model ignores model uncertainty, and thus leads to the underestimation of uncertainty when making inferences about quantities of interest. A Bayesian solution to this problem involves averaging over all possible models i.e., combinations of predictors when making inferences about quantities of interest. This approach is often not practical. In this paper we o er two alternative approaches. First we describe an ad hoc procedure called Occam's Window" which indicates a small set of models over which a model average can be computed. Second, we describe a Markov c hain Monte Carlo approach which directly approximates the exact solution. In the presence of model uncertainty, both these model averaging procedures provide better predictive performance than any single model which might reasonably have been selected.In the extreme case where there are many candidate predictors but no relationship between any of them and the response, standard variable selection procedures often choose some subset of variables that yields a high R 2 and a highly signi cant o verall F value. In this situation, Occam's Window usually indicates the null model as the only one to be considered, or else a small number of models including the null model, thus largely resolving the problem of selecting signi cant models when there is no signal in the data.Software to implement our methods is available from StatLib.
Logistic regression analysis of high-dimensional data, such as natural language text, poses computational and statistical challenges. Maximum likelihood estimation often fails in these applications. We present a simple Bayesian logistic regression approach that uses a Laplace prior to avoid overfitting and produces sparse predictive models for text data. We apply this approach to a range of document classification problems and show that it produces compact predictive models at least as effective as those produced by support vector machine classifiers or ridge logistic regression combined with feature selection. We describe our model fitting algorithm, our open source implementations (BBR and BMR), and experimental results.
We aim to produce predictive models that are not only accurate, but are also interpretable to human experts. Our models are decision lists, which consist of a series of if . . . then. . . statements (e.g., if high blood pressure, then stroke) that discretize a high-dimensional, multivariate feature space into a series of simple, readily interpretable decision statements. We introduce a generative model called Bayesian Rule Lists that yields a posterior distribution over possible decision lists. It employs a novel prior structure to encourage sparsity. Our experiments show that Bayesian Rule Lists has predictive accuracy on par with the current top algorithms for prediction in machine learning. Our method is motivated by recent developments in personalized medicine, and can be used to produce highly accurate and interpretable medical scoring systems. We demonstrate this by producing an alternative to the CHADS2 score, actively used in clinical practice for estimating the risk of stroke in patients that have atrial fibrillation. Our model is as interpretable as CHADS2, but more accurate. This is an electronic reprint of the original article published by the Institute of Mathematical Statistics in The Annals of Applied Statistics, 2015, Vol. 9, No. 3, 1350-1371. This reprint differs from the original in pagination and typographic detail. 1 2 LETHAM, RUDIN, MCCORMICK AND MADIGAN if male and adult then survival probability 21% (19%-23%) else if 3rd class then survival probability 44% (38%-51%) else if 1st class then survival probability 96% (92%-99%) else survival probability 88% (82%-94%)
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.