Inference of deep phylogenies has almost exclusively used protein rather than DNA sequences, based on the perception that protein sequences are less prone to homoplasy and saturation or to issues of compositional heterogeneity than DNA sequences. Here we analyze a model of codon evolution under an idealized genetic code and demonstrate that those perceptions may be misconceptions. We conduct a simulation study to assess the utility of protein versus DNA sequences for inferring deep phylogenies, with protein-coding data generated under models of heterogeneous substitution processes across sites in the sequence and among lineages on the tree, and then analyzed using nucleotide, amino acid, and codon models. Analysis of DNA sequences under nucleotide-substitution models (possibly with the third codon positions excluded) recovered the correct tree at least as often as analysis of the corresponding protein sequences under modern amino acid models. We also applied the different data-analysis strategies to an empirical dataset to infer the metazoan phylogeny. Our results from both simulated and real data suggest that DNA sequences may be as useful as proteins for inferring deep phylogenies and should not be excluded from such analyses. Analysis of DNA data under nucleotide models has a major computational advantage over protein-data analysis, potentially making it feasible to use advanced models that account for among-site and among-lineage heterogeneity in the nucleotide-substitution process in inference of deep phylogenies.
Different frequencies amongst codons that encode the same amino acid (i.e. synonymous codons) have been observed in multiple species. Studies focused on uncovering the forces that drive such codon usage bias showed that a combined effect of mutational biases and translational selection works to produce different frequencies of synonymous codons. However, only few have been able to measure and distinguish between these forces that may leave similar traces on the coding regions. Here, we have developed a codon model that allows the disentangling of mutation, selection on amino acids and synonymous codons, and GC-biased gene conversion (gBGC) which we employed on an extensive dataset of 415 chordates and 191 arthropods. We found that chordates need 15 more synonymous codon categories than arthropods to explain the empirical codon frequencies, which suggests that the extent of codon usage can vary greatly between animal phyla. Moreover, methylation at CpG sites seems to partially explain these patterns of codon usage bias in chordates but not in arthropods. Our findings also demonstrate that GC-rich codons are disfavoured in both phyla when mutations are GC-biased, and the opposite is true when mutations are AT-biased. This indicates that selection on the genomic coding regions might act primarily to stabilise its GC/AT content. Our study shows that the degree of synonymous codon usage varies considerably among animals, but is likely governed by a common underlying dynamic.
A substitution represents the emergence and fixation of an allele in a population or species and is the fundamental event from which phylogenetic models of sequence evolution are devised. As a result of the increasing availability of genomic sequences, we can currently leverage interspecific variability while reconstructing the tree of life. Substitutions can thus more realistically be modeled as the product of mutation, selection, and genetic drift. However, it remains unclear whether this additional complexity affects our measures of evolutionary times and rates. This study seeks to answer this question by contrasting the traditional substitution model with a population genetic counterpart by using data from 4385 individuals distributed across 179 populations and representing 17 species of animals, plants, and fungi. We found that when the population genetics dynamic is modeled via the substitution rates, the evolutionary times and rates measured by these models are well correlated, suggesting that the substitution models are able to capture the time and pace of their population counterpart. However, a closer inspection of this result showed that the traditional models largely ignore the effect of population size, even when explicitly accounted for in the substitution rates, while the effects of mutation and selection are well captured. Our results indicate that superimposing population-genetics results on the substitution rates may be efficiently used to study mutation and selection biases, while other data sources (e.g., life history traits or polymorphisms) need to be additionally integrated to infer variations in the population size. We expect our results to help develop mutation-selection phylogenetic models and inspire unified frameworks between large and short evolutionary time scales.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.