Rates and timescales of viral evolution can be estimated using phylogenetic analyses of time-structured molecular sequences. This involves the use of molecular-clock methods, calibrated by the sampling times of the viral sequences. However, the spread of these sampling times is not always sufficient to allow the substitution rate to be estimated accurately. We conducted Bayesian phylogenetic analyses of simulated virus data to evaluate the performance of the date-randomization test, which is sometimes used to investigate whether time-structured data sets have temporal signal. An estimate of the substitution rate passes this test if its mean does not fall within the 95% credible intervals of rate estimates obtained using replicate data sets in which the sampling times have been randomized. We find that the test sometimes fails to detect rate estimates from data with no temporal signal. This error can be minimized by using a more conservative criterion, whereby the 95% credible interval of the estimate with correct sampling times should not overlap with those obtained with randomized sampling times. We also investigated the behavior of the test when the sampling times are not uniformly distributed throughout the tree, which sometimes occurs in empirical data sets. The test performs poorly in these circumstances, such that a modification to the randomization scheme is needed. Finally, we illustrate the behavior of the test in analyses of nucleotide sequences of cereal yellow dwarf virus. Our results validate the use of the date-randomization test and allow us to propose guidelines for interpretation of its results.
A fundamental challenge in resolving evolutionary relationships across the tree of life is to account for heterogeneity in the evolutionary signal across loci. Studies of marsupial mammals have demonstrated that this heterogeneity can be substantial, leaving considerable uncertainty in the evolutionary timescale and relationships within the group. Using simulations and a new phylogenomic data set comprising nucleotide sequences of 1550 loci from 18 of the 22 extant marsupial families, we demonstrate the power of a method for identifying clusters of loci that support different phylogenetic trees. We find two distinct clusters of loci, each providing an estimate of the species tree that matches previously proposed resolutions of the marsupial phylogeny. We also identify a well-supported placement for the enigmatic marsupial moles (Notoryctes) that contradicts previous molecular estimates but is consistent with morphological evidence. The pattern of gene-tree variation across tree-space is characterized by changes in information content, GC content, substitution-model adequacy, and signatures of purifying selection in the data. In a simulation study, we show that incomplete lineage sorting can explain the division of loci into the two tree-topology clusters, as found in our phylogenomic analysis of marsupials. We also demonstrate the potential benefits of minimizing uncertainty from phylogenetic conflict for molecular dating. Our analyses reveal that Australasian marsupials appeared in the early Paleocene, whereas the diversification of present-day families occurred primarily during the late Eocene and early Oligocene. Our methods provide an intuitive framework for improving the accuracy and precision of phylogenetic inference and molecular dating using genome-scale data.
Molecular dating analyses allow evolutionary timescales to be estimated from genetic data, offering an unprecedented capacity for investigating the evolutionary past of all species. These methods require us to make assumptions about the relationship between genetic change and evolutionary time, often referred to as a 'molecular clock'. Although initially regarded with scepticism, molecular dating has now been adopted in many areas of biology. This broad uptake has been due partly to the development of Bayesian methods that allow complex aspects of molecular evolution, such as variation in rates of change across lineages, to be taken into account. But in order to do this, Bayesian dating methods rely on a range of assumptions about the evolutionary process, which vary in their degree of biological realism and empirical support. These assumptions can have substantial impacts on the estimates produced by molecular dating analyses. The aim of this review is to open the 'black box' of Bayesian molecular dating and have a look at the machinery inside. We explain the components of these dating methods, the important decisions that researchers must make in their analyses, and the factors that need to be considered when interpreting results. We illustrate the effects that the choices of different models and priors can have on the outcome of the analysis, and suggest ways to explore these impacts. We describe some major research directions that may improve the reliability of Bayesian dating. The goal of our review is to help researchers to make informed choices when using Bayesian phylogenetic methods to estimate evolutionary rates and timescales.
Evolutionary timescales can be estimated from genetic data using phylogenetic methods based on the molecular clock. To account for molecular rate variation among lineages, a number of relaxed-clock models have been developed. Some of these models assume that rates vary among lineages in an autocorrelated manner, so that closely related species share similar rates. In contrast, uncorrelated relaxed clocks allow all of the branch-specific rates to be drawn from a single distribution, without assuming any correlation between rates along neighbouring branches. There is uncertainty about which of these two classes of relaxed-clock models are more appropriate for biological data. We present an R package, NELSI, that allows the evolution of DNA sequences to be simulated according to a range of clock models. Using data generated by this package, we assessed the ability of two Bayesian phylogenetic methods to distinguish among different relaxed-clock models and to quantify rate variation among lineages. The results of our analyses show that rate autocorrelation is typically difficult to detect, even when there is complete taxon sampling. This provides a potential explanation for past failures to detect rate autocorrelation in a range of data sets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.