Until now the most efficient solution to align nucleotide sequences containing open reading frames was to use indirect procedures that align amino acid translation before reporting the inferred gap positions at the codon level. There are two important pitfalls with this approach. Firstly, any premature stop codon impedes using such a strategy. Secondly, each sequence is translated with the same reading frame from beginning to end, so that the presence of a single additional nucleotide leads to both aberrant translation and alignment.We present an algorithm that has the same space and time complexity as the classical Needleman-Wunsch algorithm while accommodating sequencing errors and other biological deviations from the coding frame. The resulting pairwise coding sequence alignment method was extended to a multiple sequence alignment (MSA) algorithm implemented in a program called MACSE (Multiple Alignment of Coding SEquences accounting for frameshifts and stop codons). MACSE is the first automatic solution to align protein-coding gene datasets containing non-functional sequences (pseudogenes) without disrupting the underlying codon structure. It has also proved useful in detecting undocumented frameshifts in public database sequences and in aligning next-generation sequencing reads/contigs against a reference coding sequence.MACSE is distributed as an open-source java file executable with freely available source code and can be used via a web interface at: http://mbb.univ-montp2.fr/macse.
BackgroundDevelopment of phylogenetic methods that do not rely on fossils for the study of evolutionary processes through time have revolutionized the field of evolutionary biology and resulted in an unprecedented expansion of our knowledge about the tree of life. These methods have helped to shed light on the macroevolution of many taxonomic groups such as the placentals (Mammalia). However, despite the increase of studies addressing the diversification patterns of organisms, no synthesis has addressed the case of the most diversified mammalian clade: the Rodentia.ResultsHere we present a rodent maximum likelihood phylogeny inferred from a molecular supermatrix. It is based on 11 mitochondrial and nuclear genes that covers 1,265 species, i.e., respectively 56% and 81% of the known specific and generic rodent diversity. The inferred topology recovered all Rodentia clades proposed by recent molecular works. A relaxed molecular clock dating approach provided a time framework for speciation events. We found that the Myomorpha clade shows a greater degree of variation in diversification rates than Sciuroidea, Caviomorpha, Castorimorpha and Anomaluromorpha. We identified a number of shifts in diversification rates within the major clades: two in Castorimorpha, three in Ctenohystrica, 6 within the squirrel-related clade and 24 in the Myomorpha clade. The majority of these shifts occurred within the most recent familial rodent radiations: the Cricetidae and Muridae clades. Using the topological imbalances and the time line we discuss the potential role of different diversification factors that might have shaped the rodents radiation.ConclusionsThe present glimpse on the diversification pattern of rodents can be used for further comparative meta-analyses. Muroid lineages have a greater degree of variation in their diversification rates than any other rodent group. Different topological signatures suggest distinct diversification processes among rodent lineages. In particular, Muroidea and Sciuroidea display widespread distribution and have undergone evolutionary and adaptive radiation on most of the continents. Our results show that rodents experienced shifts in diversification rate regularly through the Tertiary, but at different periods for each clade. A comparison between the rodent fossil record and our results suggest that extinction led to the loss of diversification signal for most of the Paleogene nodes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.