Detecting selective sweeps from genomic SNP data is complicated by the intricate ascertainment schemes used to discover SNPs, and by the confounding influence of the underlying complex demographics and varying mutation and recombination rates. Current methods for detecting selective sweeps have little or no robustness to the demographic assumptions and varying recombination rates, and provide no method for correcting for ascertainment biases. Here, we present several new tests aimed at detecting selective sweeps from genomic SNP data. Using extensive simulations, we show that a new parametric test, based on composite likelihood, has a high power to detect selective sweeps and is surprisingly robust to assumptions regarding recombination rates and demography (i.e., has low Type I error). Our new test also provides estimates of the location of the selective sweep(s) and the magnitude of the selection coefficient. To illustrate the method, we apply our approach to data from the Seattle SNP project and to Chromosome 2 data from the HapMap project. In Chromosome 2, the most extreme signal is found in the lactase gene, which previously has been shown to be undergoing positive selection. Evidence for selective sweeps is also found in many other regions, including genes known to be associated with disease risk such as DPP10 and COL4A3.
The hitchhiking effect of a beneficial mutation, or a selective sweep, generates a unique distribution of allele frequencies and spatial distribution of polymorphic sites. A composite-likelihood test was previously designed to detect these signatures of a selective sweep, solely on the basis of the spatial distribution and marginal allele frequencies of polymorphisms. As an excess of linkage disequilibrium (LD) is also known to be a strong signature of a selective sweep, we investigate how much statistical power is increased by the inclusion of information regarding LD. The expected pattern of LD is predicted by a genealogical approach. Both theory and simulation suggest that strong LD is generated in narrow regions at both sides of the location of beneficial mutation. However, a lack of LD is expected across the two sides. We explore various ways to detect this signature of selective sweeps by statistical tests. A new composite-likelihood method is proposed to incorporate information regarding LD. This method enables us to detect selective sweeps and estimate the parameters of the selection model better than the previous composite-likelihood method that does not take LD into account. However, the improvement made by including LD is rather small, suggesting that most of the relevant information regarding selective sweeps is captured by the spatial distribution and marginal allele frequencies of polymorphisms.
The process of strong artificial selection during a domestication event is modeled, and its effect on the pattern of DNA polymorphism is investigated. The model also considers population bottleneck during domestication. Artificial selection during domestication is different from a regular selective sweep because artificial selection acts on alleles that may have been neutral variants before domestication. Therefore, the fixation of such a beneficial allele does not always wipe out DNA variation in the surrounding region. The amount by which variation is reduced largely depends on the initial frequency of the beneficial allele, p. As a consequence, p has a strong effect on the likelihood of detecting the signature of selection during domestication from patterns of polymorphism. These theoretical results are discussed in light of data collected from maize. Although the main focus of this article is on domestication, this model can also be generalized to describe selective sweeps from standing genetic variation.population genetics ͉ theory ͉ coalescent ͉ domestication selection A rtificial selection is believed to be the main evolutionary force acting on domesticated species since their origin 5,000-10,000 years ago. During domestication, humans exercised extremely strong selective pressure on ancestral gene pools to achieve desired phenotypic characteristics. These beneficial phenotypes were therefore fixed in the founder population of domesticated species in a short (probably very short) time. These fixation events differ from the fixation of an advantageous mutant in a natural population, in that artificial selection in a domestication event acts on an allele that was likely a neutral or nearly neutral variant before domestication. In other words, domestication causes some neutral polymorphisms in the ancestral population of the wild progenitor species to suddenly become very advantageous in the small founder population, the progenitor of the domesticated species. Therefore, the initial frequency of a beneficial allele (p) before domestication is not necessarily low. In contrast, the initial frequency of an advantageous mutant in a regular selective sweep model is 1͞(2N) (1), where N is the diploid population size. Hence, models developed to describe selective sweeps in natural populations may not be appropriate for cases in which alleles are fixed from standing genetic variation, such as has been described for an amino acid variant at the CAULIFLOWER gene in Brassica (2).In this article, a model for this process of strong artificial selection during a domestication event is developed. In addition to artificial selection, the model incorporates a population size bottleneck during domestication so that the level of polymorphism in the cultivated species is expected to be lower than that in its wild progenitor species (3, 4). In cultivated crops, polymorphism is typically reduced by 60-80% (5). Under this model, the patterns of DNA polymorphism both with and without selection are studied to understand the genetic ...
In 2002 Kim and Stephan proposed a promising composite-likelihood method for localizing and estimating the fitness advantage of a recently fixed beneficial mutation. Here, we demonstrate that their compositelikelihood-ratio (CLR) test comparing selective and neutral hypotheses is not robust to undetected population structure or a recent bottleneck, with some parameter combinations resulting in a false positive rate of nearly 90%. We also propose a goodness-of-fit test for discriminating rejections due to directional selection (true positive) from those due to population and demographic forces (false positives) and demonstrate that the new method has high sensitivity to differentiate the two classes of rejections.
The origin of domesticated Asian rice (Oryza sativa) has been a contentious topic, with conflicting evidence for either single or multiple domestication of this key crop species. We examined the evolutionary history of domesticated rice by analyzing de novo assembled genomes from domesticated rice and its wild progenitors. Our results indicate multiple origins, where each domesticated rice subpopulation (japonica, indica, and aus) arose separately from progenitor O. rufipogon and/or O. nivara. Coalescence-based modeling of demographic parameters estimate that the first domesticated rice population to split off from O. rufipogon was O. sativa ssp. japonica, occurring at ∼13.1–24.1 ka, which is an order of magnitude older then the earliest archeological date of domestication. This date is consistent, however, with the expansion of O. rufipogon populations after the Last Glacial Maximum ∼18 ka and archeological evidence for early wild rice management in China. We also show that there is significant gene flow from japonica to both indica (∼17%) and aus (∼15%), which led to the transfer of domestication alleles from early-domesticated japonica to proto-indica and proto-aus populations. Our results provide support for a model in which different rice subspecies had separate origins, but that de novo domestication occurred only once, in O. sativa ssp. japonica, and introgressive hybridization from early japonica to proto-indica and proto-aus led to domesticated indica and aus rice.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.