Elaboration of Bayesian phylogenetic inference methods has continued at pace in recent years with major new advances in nearly all aspects of the joint modelling of evolutionary data. It is increasingly appreciated that some evolutionary questions can only be adequately answered by combining evidence from multiple independent sources of data, including genome sequences, sampling dates, phenotypic data, radiocarbon dates, fossil occurrences, and biogeographic range information among others. Including all relevant data into a single joint model is very challenging both conceptually and computationally. Advanced computational software packages that allow robust development of compatible (sub-)models which can be composed into a full model hierarchy have played a key role in these developments. Developing such software frameworks is increasingly a major scientific activity in its own right, and comes with specific challenges, from practical software design, development and engineering challenges to statistical and conceptual modelling challenges. BEAST 2 is one such computational software platform, and was first announced over 4 years ago. Here we describe a series of major new developments in the BEAST 2 core platform and model hierarchy that have occurred since the first release of the software, culminating in the recent 2.5 release.
Motivation Genome sequencing projects have revealed frequent gains and losses of genes between species. Previous versions of our software, Computational Analysis of gene Family Evolution (CAFE), have allowed researchers to estimate parameters of gene gain and loss across a phylogenetic tree. However, the underlying model assumed that all gene families had the same rate of evolution, despite evidence suggesting a large amount of variation in rates among families. Results Here, we present CAFE 5, a completely re-written software package with numerous performance and user-interface enhancements over previous versions. These include improved support for multithreading, the explicit modeling of rate variation among families using gamma-distributed rate categories, and command-line arguments that preclude the use of accessory scripts. Availability and implementation CAFE 5 source code, documentation, test data and a detailed manual with examples are freely available at https://github.com/hahnlab/CAFE5/releases. Contact danvand@indiana.edu Supplementary information Supplementary data are available at Bioinformatics online.
Substitution rates are known to be variable among genes, chromosomes, species, and lineages due to multifarious biological processes. Here, we consider another source of substitution rate variation due to a technical bias associated with gene tree discordance. Discordance has been found to be rampant in genome-wide data sets, often due to incomplete lineage sorting (ILS). This apparent substitution rate variation is caused when substitutions that occur on discordant gene trees are analyzed in the context of a single, fixed species tree. Such substitutions have to be resolved by proposing multiple substitutions on the species tree, and we therefore refer to this phenomenon as Substitutions Produced by ILS (SPILS). We use simulations to demonstrate that SPILS has a larger effect with increasing levels of ILS, and on trees with larger numbers of taxa. Specific branches of the species trees are consistently, but erroneously, inferred to be longer or shorter, and we show that these branches can be predicted based on discordant tree topologies. Moreover, we observe that fixing a species tree topology when performing tests of positive selection increases the false positive rate, particularly for genes whose discordant topologies are most affected by SPILS. Finally, we use data from multiple Drosophila species to show that SPILS can be detected in nature. Although the effects of SPILS are modest per gene, it has the potential to affect substitution rate variation whenever high levels of ILS are present, particularly in rapid radiations. The problems outlined here have implications for character mapping of any type of trait, and for any biological process that causes discordance. We discuss possible solutions to these problems, and areas in which they are likely to have caused faulty inferences of convergence and accelerated evolution.
Characterizing the nature of the adaptive process at the genetic level is a central goal for population genetics. In particular, we know little about the sources of adaptive substitution or about the number of adaptive variants currently segregating in nature. Historically, population geneticists have focused attention on the hard-sweep model of adaptation in which a de novo beneficial mutation arises and rapidly fixes in a population. Recently more attention has been given to soft-sweep models, in which alleles that were previously neutral, or nearly so, drift until such a time as the environment shifts and their selection coefficient changes to become beneficial. It remains an active and difficult problem, however, to tease apart the telltale signatures of hard vs. soft sweeps in genomic polymorphism data. Through extensive simulations of hard-and soft-sweep models, here we show that indeed the two might not be separable through the use of simple summary statistics. In particular, it seems that recombination in regions linked to, but distant from, sites of hard sweeps can create patterns of polymorphism that closely mirror what is expected to be found near soft sweeps. We find that a very similar situation arises when using haplotype-based statistics that are aimed at detecting partial or ongoing selective sweeps, such that it is difficult to distinguish the shoulder of a hard sweep from the center of a partial sweep. While knowing the location of the selected site mitigates this problem slightly, we show that stochasticity in signatures of natural selection will frequently cause the signal to reach its zenith far from this site and that this effect is more severe for soft sweeps; thus inferences of the target as well as the mode of positive selection may be inaccurate. In addition, both the time since a sweep ends and biologically realistic levels of allelic gene conversion lead to errors in the classification and identification of selective sweeps. This general problem of "soft shoulders" underscores the difficulty in differentiating soft and partial sweeps from hard-sweep scenarios in molecular population genomics data. The soft-shoulder effect also implies that the more common hard sweeps have been in recent evolutionary history, the more prevalent spurious signatures of soft or partial sweeps may appear in some genome-wide scans.KEYWORDS natural selection; population genetics; selective sweeps T HERE is a growing body of evidence that positive directional selection has a pervasive impact on genetic variation within and between species (reviewed in Hahn 2008; Sella et al. 2009;Cutter and Payseur 2013). In stark contrast to predictions from Kimura's neutral theory of molecular evolution (Kimura 1968), there is strong evidence that in at least some taxa a large fraction of nucleotide differences between species have been fixed by positive selection (Fay et al. 2001(Fay et al. , 2002Smith and Eyre-Walker 2002;Begun et al. 2007;Boyko et al. 2008;Langley et al. 2012). A great deal of emphasis has there...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.