Departures from the assumption of homogenously interdigitated neutral and putatively selected sites in the McDonald-Kreitman test can lead to false rejections of the neutral model in the presence of intermediate levels of recombination. This problem is exacerbated by small sample sizes, nonequilibrium demography, recombination rate variation, and in comparisons involving more recently diverged species. I propose that establishing significance levels by coalescent simulation with recombination can improve the fidelity of the test in genomewide scans for selection on noncoding DNA.
T HE McDonald-Kreitman (MK) test is a widely usedstatistical test of neutral model of evolution originally proposed for nonrecombining protein sequences (McDonald and Kreitman 1991;Nielsen 2005). The test compares within-species polymorphism and between-species divergence for two distinct classes of sites: synonymous sites, which are assumed to be neutral and nonsynonymous sites, which are putative targets of selection. The null model assumes that some fraction, f, of nonsynonymous sites is strongly deleterious and contributes negligibly to polymorphism and divergence. For the remaining nonsynonymous sites, (1 À f ), the ratio of polymorphism to divergence is expected to be identical to that for synonymous sites, if both behave according to the neutral model.Departures from expectations under neutral model are expected if nonsynonymous sites experience negative or positive selection. If the evolution of (nonlethal) nonsynonymous sites in a particular gene is largely governed by negative (or purifying) selection, this will tend to decrease the level of divergence at nonsynonymous sites more strongly than levels of polymorphism (Kimura 1983). The result will be that nonsynonymous sites will exhibit a higher ratio of polymorphism to divergence than putatively neutral synonymous sites. In contrast, if the evolution of nonsynonymous sites is governed primarily by positive selection, this will tend to decrease the ratio of polymorphism to divergence relative to synonymous sites. A departure from the neutral expectation in either direction can be detected by applying a standard statistical test to a 2 3 2 contingency table of polymorphism and divergence counts for synonymous and nonsynonymous sites.An attractive feature of the MK test is that the inference of selection, although not necessarily its mode, is remarkably robust to assumptions about nonequilibrium demography (Nielsen 2001;Eyre-Walker 2002) and recombination rates (Sawyer and Hartl 1992). In the absence of recombination, this robustness stems from the fact that all surveyed sites share the same genealogy and thus the entries of the 2 3 2 table are sufficient statistics (Nielsen 2001). In the presence of recombination, the robustness of the MK test is largely owed to the fact that nonsynonymous and synonymous sites are homogenously interdigitated in protein sequences ( Figure 1A).Using synonymous (or other neutral sites) to test for selection in linked noncoding DNA: Althoug...