The advent of modern DNA sequencing technology is the driving force in obtaining complete intra-specific genomes that can be used to detect loci that have been subject to positive selection in the recent past. Based on selective sweep theory, beneficial loci can be detected by examining the single nucleotide polymorphism patterns in intraspecific genome alignments. In the last decade, a plethora of algorithms for identifying selective sweeps have been developed. However, the majority of these algorithms have not been designed for analyzing whole-genome data. We present SweeD (Sweep Detector), an open-source tool for the rapid detection of selective sweeps in whole genomes. It analyzes site frequency spectra and represents a substantial extension of the widely used SweepFinder program. The sequential version of SweeD is up to 22 times faster than SweepFinder and, more importantly, is able to analyze thousands of sequences. We also provide a parallel implementation of SweeD for multi-core processors. Furthermore, we implemented a checkpointing mechanism that allows to deploy SweeD on cluster systems with queue execution time restrictions, as well as to resume long-running analyses after processor failures. In addition, the user can specify various demographic models via the command-line to calculate their theoretically expected site frequency spectra. Therefore, (in contrast to SweepFinder) the neutral site frequencies can optionally be directly calculated from a given demographic model. We show that an increase of sample size results in more precise detection of positive selection. Thus, the ability to analyze substantially larger sample sizes by using SweeD leads to more accurate sweep detection. We validate SweeD via simulations and by scanning the first chromosome from the 1000 human Genomes project for selective sweeps. We compare SweeD results with results from a linkage-disequilibrium-based approach and identify common outliers.
Drosophila melanogaster spread from sub-Saharan Africa to the rest of the world colonizing new environments. Here, we modeled the joint demography of African (Zimbabwe), European (The Netherlands), and North American (North Carolina) populations using an approximate Bayesian computation (ABC) approach. By testing different models (including scenarios with continuous migration), we found that admixture between Africa and Europe most likely generated the North American population, with an estimated proportion of African ancestry of 15%. We also revisited the demography of the ancestral population (Africa) and found-in contrast to previous work-that a bottleneck fits the history of the population of Zimbabwe better than expansion. Finally, we compared the site-frequency spectrum of the ancestral population to analytical predictions under the estimated bottleneck model. TO date, several studies have confirmed that Drosophila melanogaster originated in sub-Saharan Africa and spread to the rest of the world (Lachaise et al. 1988;David and Capy 1988;Begun and Aquadro 1993; Andolfatto 2001;Stephan and Li 2007). With its cosmopolitan distribution we expect that different populations have evolved and adapted differently to distinct environments, making D. melanogaster a perfect study system for both adaptation and population history. Extensive research has been performed to detect signatures of adaptation at the genome level (Sabeti et al. 2006;Li and Stephan 2006;Zayed and Whitfield 2008). Such detection usually depends on the underlying demographic scenario, since demographic events can leave similar patterns on the genome as adaptive (selective) events (Kim and Stephan 2002;Glinka et al. 2003;Jensen et al. 2005;Nielsen et al. 2005;Pavlidis et al. 2008Pavlidis et al. , 2010a. Therefore, a better understanding of the demography of a population will not only allow us to estimate past and present population sizes and the times of the population size changes but will also decrease the rate of false positives of signatures of adaptation. Here we study the demography of African, European, and North American populations, with an emphasis on the North American population.There is evidence that D. melanogaster colonized North America ,200 years ago (Johnson 1913;Sturtevant 1920;Keller 2007). D. melanogaster (then known as D. ampelophila) was first reported in New York in 1875 by New York State entomologist Lintner (Lintner 1882;Keller 2007). In the year 1879 several articles were published indicating the appearance of D. melanogaster in several parts of eastern North America, including Connecticut and Massachusetts (Johnson 1913). At that time the dipteran fauna was very well described. It is therefore unlikely that entomologists would have overlooked D. melanogaster for long (Keller 2007). Less than 25 years after its introduction, D. melanogaster became the most common dipteran species in North America (Howard 1900). Johnson (1913) suggested that North America could have been colonized from the tropics, since the fir...
Despite major progress in dissecting the molecular pathways that control DNA methylation patterns in plants, little is known about the mechanisms that shape plant methylomes over evolutionary time. Drawing on recent intra- and interspecific epigenomic studies, we show that methylome evolution over long timescales is largely a byproduct of genomic changes. By contrast, methylome evolution over short timescales appears to be driven mainly by spontaneous epimutational events. We argue that novel methods based on analyses of the methylation site frequency spectrum (mSFS) of natural populations can provide deeper insights into the evolutionary forces that act at each timescale.Electronic supplementary materialThe online version of this article (doi:10.1186/s13059-016-1127-5) contains supplementary material, which is available to authorized users.
Seed banks are a common characteristics to many plant species, which allow storage of genetic diversity in the soil as dormant seeds for various periods of time. We investigate an above-ground population following a Fisher-Wright model with selection coupled with a deterministic seed bank assuming the length of the seed bank is kept constant and the number of seeds is large. To assess the combined impact of seed banks and selection on genetic diversity, we derive a general diffusion model. The applied techniques outline a path of approximating a stochastic delay differential equation by an appropriately rescaled stochastic differential equation, which is a common issue in statistical physics. We compute the equilibrium solution of the site-frequency spectrum and derive the times to fixation of an allele with and without selection. Finally, it is demonstrated that seed banks enhance the effect of selection onto the site-frequency spectrum while slowing down the time until the mutation-selection equilibrium is reached.
There is currently large interest in distinguishing the signatures of genetic variation produced by demographic events from those produced by natural selection. We propose a simple multilocus statistical test to identify candidate sites of selective sweeps with high power. The test is based on the variability profile measured in an array of linked microsatellites. We also show that the analysis of flanking markers drastically reduces the number of false positives among the candidates that are identified in a genomewide survey of unlinked loci and find that this property is maintained in many populationbottleneck scenarios. However, for a certain range of intermediately severe population bottlenecks we find genomic signatures that are very similar to those produced by a selective sweep. While in these worst-case scenarios the power of the proposed test remains high, the false-positive rate reaches values close to 50%. Hence, selective sweeps may be hard to identify even if multiple linked loci are analyzed. Nevertheless, the integration of information from multiple linked loci always leads to a considerable reduction of the falsepositive rate compared to a genome scan of unlinked loci. We discuss the application of this test to experimental data from Drosophila melanogaster.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.