Mutations that accumulate in the genome of cells or viruses can be used to infer their evolutionary history. In the case of rapidly evolving organisms, genomes can reveal their detailed spatiotemporal spread. Such phylodynamic analyses are particularly useful to understand the epidemiology of rapidly evolving viral pathogens. As the number of genome sequences available for different pathogens has increased dramatically over the last years, phylodynamic analysis with traditional methods becomes challenging as these methods scale poorly with growing datasets. Here, we present TreeTime, a Python-based framework for phylodynamic analysis using an approximate Maximum Likelihood approach. TreeTime can estimate ancestral states, infer evolution models, reroot trees to maximize temporal signals, estimate molecular clock phylogenies and population size histories. The runtime of TreeTime scales linearly with dataset size.
Mutations that accumulate in the genome of replicating biological organisms can be used to infer their evolutionary history. In the case of measurably evolving organisms genomes often reveal their detailed spatiotemporal spread. Such phylodynamic analyses are particularly useful to understand the epidemiology of rapidly evolving viral pathogens. The number of genome sequences available for different pathogens, however, has increased dramatically over the last couple of years and traditional methods for phylodynamic analysis scale poorly with growing data sets. Here, we present TreeTime, a Python based framework for phylodynamic analysis using an approximate Maximum Likelihood approach. TreeTime can estimate ancestral states, infer evolution models, reroot trees to maximize temporal signals, estimate molecular clock phylogenies and population size histories. The run time of TreeTime scales linearly with data set size.Phylogenetics uses differences between homologous sequences to infer the history of the sample and learn about the evolutionary processes that gave rise to the observed diversity. In absence of recombination, this history is a tree along which sequences descend from ancestors with modification. In general, the reconstruction of phylogenetic trees is a computationally difficult problem but efficient heuristics often produce reliable reconstructions in polynomial time (Felsenstein, 2004;Price et al., 2010;Stamatakis, 2014). Such heuristics become indispensable for large data sets of hundreds or thousands of sequences.In addition to tree building, many research questions require parameter inference and hypothesis testing (Drummond et al., 2012;Pond and Muse, 2005). Again, exact inference and testing with large data sets is computationally expensive since it requires high-dimensional optimization of complex likelihood functions or extensive sampling of the posterior. Efficient heuristics are needed to cope with the growing data sets available today.Phylogenetic tree reconstruction relies on substitutions that accumulated between the common ancestor and the sequences in the sample. Zuckerkandl and Pauling (1965) hypothesized that changes in protein sequences accumulate linearly and that the number of differences between homologous sequences can be used as a molecular clock to date the divergence between sequences. The molecular clock has since been used to date the divergence of ancient proteins billions of years ago as well as the spread of RNA viruses on time scales less than a year (Langley and Fitch, 1974;Rambaut, 2000;Sanderson, 2003;Yoder and Yang, 2000). Beyond dating of individual divergence events of the time of the common ancestor, algorithms have been developed to infer a tree where branch lengths correspond directly to elapsed time and each node is placed such that its position reflects its known or inferred date. Such trees are known as time trees, molecular clock phylogenies, or time stamped phylogenies. Furthermore, it became evident that substitution rates can vary between different parts ...
Mutation rates and fitness costs of deleterious mutations are difficult to measure in vivo but essential for a quantitative understanding of evolution. Using whole genome deep sequencing data from longitudinal samples during untreated HIV-1 infection, we estimated mutation rates and fitness costs in HIV-1 from the dynamics of genetic variation. At approximately neutral sites, mutations accumulate with a rate of 1.2 × 10−5 per site per day, in agreement with the rate measured in cell cultures. We estimated the rate from G to A to be the largest, followed by the other transitions C to T, T to C, and A to G, while transversions are less frequent. At other sites, mutations tend to reduce virus replication. We estimated the fitness cost of mutations at every site in the HIV-1 genome using a model of mutation selection balance. About half of all non-synonymous mutations have large fitness costs (>10 percent), while most synonymous mutations have costs <1 percent. The cost of synonymous mutations is especially low in most of pol where we could not detect measurable costs for the majority of synonymous mutations. In contrast, we find high costs for synonymous mutations in important RNA structures and regulatory regions. The intra-patient fitness cost estimates are consistent across multiple patients, indicating that the deleterious part of the fitness landscape is universal and explains a large fraction of global HIV-1 group M diversity.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.