1Most current methods for detecting natural selection from DNA sequence data are limited in that 2 they are either based on summary statistics or a composite likelihood, and as a consequence, do not 3 make full use of the information available in DNA sequence data. We here present a new importance 4 sampling approach for approximating the full likelihood function for the selection coefficient. The 5 method treats the ancestral recombination graph (ARG) as a latent variable that is integrated out 6 using previously published Markov Chain Monte Carlo (MCMC) methods. The method can be used 7 for detecting selection, estimating selection coefficients, testing models of changes in the strength of 8 selection, estimating the time of the start of a selective sweep, and for inferring the allele frequency 9 trajectory of a selected or neutral allele. We perform extensive simulations to evaluate the method 10 and show that it uniformly improves power to detect selection compared to current popular methods 11 such as nSL and SDS, under various demographic models and can provide reliable inferences of allele 12 frequency trajectories under many conditions. We also explore the potential of our method to detect 13 extremely recent changes in the strength of selection. We use the method to infer the past allele 14 frequency trajectory for a lactase persistence SNP (MCM6) in Europeans. We also study a set of 11 15 pigmentation-associated variants. Several genes show evidence of strong selection particularly within 16 the last 5,000 years, including ASIP, KITLG, and TYR. However, selection on OCA2/HERC2 seems 17 to be much older and, in contrast to previous claims, we find no evidence of selection on TYRP1.
18Author summary 19 Current methods to study natural selection using modern population genomic data are limited in their 20 power and flexibility. Here, we present a new method to infer natural selection that builds on recent 21 methodological advances in estimating genome-wide genealogies. By using importance sampling 22 we are able to efficiently estimate the likelihood function of the selection coefficient. We show our 23 method improves power to test for selection over competing methods across a diverse range of 24 scenarios, and also accurately infers the selection coefficient. We also demonstrate a novel capability 25 of our model, using it to infer the allele's frequency over time. We validate these results with a 26 study of a lactase persistence SNP in Europeans, and also study a set of 11 pigmentation-associated 27 variants. 28 1/46 106 Furthermore, the new method is, to our knowledge, the first that is capable of inferring the allele 107 frequency trajectories for models with recombination and selection using only modern data. We are 108 able to accomplish this task using the aforementioned Markovian structure of both coalescence and 109 the trajectory, forming a HMM over these two hidden states and solving for the posterior marginals 110 of each hidden allele frequency state over time. Recently, Edge & Coop propose...