When polymorphism and divergence data are available for multiple loci, extended forms of the McDonald-Kreitman test can be used to estimate the average proportion of the amino acid divergence due to adaptive evolution-a statistic denoted a. But such tests are subject to many biases. Most serious is the possibility that high estimates of a reflect demographic changes rather than adaptive substitution. Testing for between-locus variation in a is one possible way of distinguishing between demography and selection. However, such tests have yielded contradictory results, and their efficacy is unclear. Estimates of a from the same model organisms have also varied widely. This study clarifies the reasons for these discrepancies, identifying several method-specific biases in widely used estimators and assessing the power of the methods. As part of this process, a new maximum-likelihood estimator is introduced. This estimator is applied to a newly compiled data set of 115 genes from Drosophila simulans, each with each orthologs from D. melanogaster and D. yakuba. In this way, it is estimated that a % 0:4 6 0:1, a value that does not vary substantially between different loci or over different periods of divergence. The implications of these results are discussed.
T HE McDonald-Kreitman test (McDonald andKreitman 1991; Kreitman and Akashi 1995) is an important technique for quantifying the contribution of positive Darwinian selection to molecular evolution. The test compares levels of polymorphism within a species to measures of divergence between species and relies on the assumption that a certain class of mutations can be treated as effectively neutral, a priori. Following McDonald and Kreitman, most studies have focused on protein-coding sequences and used synonymous mutations as their assumed-neutral referent. As such, the tests compare levels of synonymous polymorphism (P s ) and divergence (D s ) with their nonsynonymous (amino acid changing) equivalents (P n and D n ). The focus of many studies has been to estimate the proportion of the nonsynonymous divergence, D n , that was due to adaptive evolution, a statistic that is denoted a.A serious problem with these tests is that levels of polymorphism are typically low in most population samples at most loci, especially if rare variants are excluded, and this means that single-locus estimates of a can be unreliable. To solve this problem, many methods of combining data from multiple loci have been introduced (Fay et al. 2001;Bustamante et al. 2002;Smith and Eyre-Walker 2002;Sawyer et al. 2003;Bierne and Eyre-Walker 2004). Such methods can be used to estimate a, the average value of a across the sampled loci. However, it is now clear that different variants of the test have given different results when applied to data from the same model organism. Consider, for example, published results using polymorphism data from Drosophila simulans. Smith and Eyre-Walker (2002) introduced a heuristic estimator of a that they applied to a data set of 35 loci. Measuring diverg...