There are two competing philosophies of statistical analysis: the Bayesian and the frequentist. The frequentists are much the larger group, and almost all the statistical analyses which appear in the BMJ are frequentist. The Bayesians are much fewer and until recently could only snipe at the frequentists from the high ground of university departments of mathematical statistics. Now the increasing power of computers is bringing Bayesian methods to the fore.Bayesian methods are based on the idea that unknown quantities, such as population means and proportions, have probability distributions. The probability distribution for a population proportion expresses our prior knowledge or belief about it, before we add the knowledge which comes from our data. For example, suppose we want to estimate the prevalence of diabetes in a health district. We could use the knowledge that the percentage of diabetics in the United Kingdom as a whole is about 2%, so we expect the prevalence in our health district to be fairly similar. It is unlikely to be 10%, for example. We might have information based on other datasets that such rates vary between 1% and 3%, or we might guess that the prevalence is somewhere between these values. We can construct a prior distribution which summarises our beliefs about the prevalence in the absence of specific data. We can do this with a distribution having mean 2 and standard deviation 0.5, so that two standard deviations on either side of the mean are 1% and 3%. (The precise mathematical form of the prior distribution depends on the particular problem.)Suppose we now collect some data by a sample survey of the district population. We can use the data to modify the prior probability distribution to tell us what we now think the distribution of the population percentage is; this is the posterior distribution. For example, if we did a survey of 1000 subjects and found 15 (1.5%) to be diabetic, the posterior distribution would have mean 1.7% and standard deviation 0.3%. We can calculate a set of values, a 95% credible interval (1.2% to 2.4% for the example), such that there is a probability of 0.95 that the percentage of diabetics is within this set. The frequentist analysis, which ignores the prior information, would give an estimate 1.5% with standard error 0.4% and 95% confidence interval 0.8% to 2.5%. This is similar to the results of the Bayesian method, as is usually the case, but the Bayesian method gives an estimate nearer the prior mean and a narrower interval.Frequentist methods regard the population value as a fixed, unvarying (but unknown) quantity, without a probability distribution. Frequentists then calculate confidence intervals for this quantity, or significance tests of hypotheses concerning it. Bayesians reasonably object that this does not allow us to use our wider knowledge of the problem. Also, it does not provide what researchers seem to want, which is to be able to say that there is a probability of 95% that the population value lies within the 95% confidence interval, or that...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.