The amount of data used in phylogenetics has grown explosively in the recent years and many phylogenies are inferred with hundreds or even thousands of loci and many taxa. These modern phylogenomic studies often entail separate analyses of each of the loci in addition to multiple analyses of subsets of genes or concatenated sequences. Computationally efficient tools for handling and computing properties of thousands of single-locus or large concatenated alignments are needed. Here I present AMAS (Alignment Manipulation And Summary), a tool that can be used either as a stand-alone command-line utility or as a Python package. AMAS works on amino acid and nucleotide alignments and combines capabilities of sequence manipulation with a function that calculates basic statistics. The manipulation functions include conversions among popular formats, concatenation, extracting sites and splitting according to a pre-defined partitioning scheme, creation of replicate data sets, and removal of taxa. The statistics calculated include the number of taxa, alignment length, total count of matrix cells, overall number of undetermined characters, percent of missing data, AT and GC contents (for DNA alignments), count and proportion of variable sites, count and proportion of parsimony informative sites, and counts of all characters relevant for a nucleotide or amino acid alphabet. AMAS is particularly suitable for very large alignments with hundreds of taxa and thousands of loci. It is computationally efficient, utilizes parallel processing, and performs better at concatenation than other popular tools. AMAS is a Python 3 program that relies solely on Python’s core modules and needs no additional dependencies. AMAS source code and manual can be downloaded from under GNU General Public License.
The amount of data used in phylogenetics has grown explosively in the recent years and many phylogenies are inferred with hundreds or even thousands of loci and many taxa.These modern phylogenomic studies often entail separate analyses of each of the loci in addition to multiple analyses of subsets of genes or concatenated sequences. ABSTRACTThe amount of data used in phylogenetics has grown explosively in the recent years and many phylogenies are inferred with hundreds or even thousands of loci and many taxa. These modern phylogenomic studies often entail separate analyses of each of the loci in addition to multiple analyses of subsets of genes or concatenated sequences. Computationally efficient tools for handling and computing properties of thousands of single-locus or large concatenated alignments are needed. Here I present AMAS (Alignment Manipulation And Summary), a tool that can be used either as a stand-alone command-line utility or as a Python package. AMAS works on amino acid and nucleotide alignments and combines capabilities of sequence manipulation with a function that calculates basic statistics. The manipulation functions include conversions among popular formats, concatenation, extracting sites and splitting according to a pre-defined partitioning scheme, and creation of replicate data sets. The statistics calculated include the number of taxa, alignment length, total count of matrix cells, overall number of undetermined characters, percent of missing data, AT and GC contents (for DNA alignments), count and proportion of variable sites, count and proportion of parsimony informative sites, and counts of all characters relevant for a nucleotide or amino acid alphabet. AMAS is particularly suitable for very large alignments with hundreds of taxa and thousands of loci. It performs better at concatenation and summarizing alignments than other popular tools. AMAS is a Python 3 program that relies solely on Python's core modules. AMAS source code and manual can be downloaded from http://github.com/marekborowiec/AMAS/
BackgroundUnderstanding the phylogenetic relationships among major lineages of multicellular animals (the Metazoa) is a prerequisite for studying the evolution of complex traits such as nervous systems, muscle tissue, or sensory organs. Transcriptome-based phylogenies have dramatically improved our understanding of metazoan relationships in recent years, although several important questions remain. The branching order near the base of the tree, in particular the placement of the poriferan (sponges, phylum Porifera) and ctenophore (comb jellies, phylum Ctenophora) lineages is one outstanding issue. Recent analyses have suggested that the comb jellies are sister to all remaining metazoan phyla including sponges. This finding is surprising because it suggests that neurons and other complex traits, present in ctenophores and eumetazoans but absent in sponges or placozoans, either evolved twice in Metazoa or were independently, secondarily lost in the lineages leading to sponges and placozoans.ResultsTo address the question of basal metazoan relationships we assembled a novel dataset comprised of 1080 orthologous loci derived from 36 publicly available genomes representing major lineages of animals. From this large dataset we procured an optimized set of partitions with high phylogenetic signal for resolving metazoan relationships. This optimized data set is amenable to the most appropriate and computationally intensive analyses using site-heterogeneous models of sequence evolution. We also employed several strategies to examine the potential for long-branch attraction to bias our inferences. Our analyses strongly support the Ctenophora as the sister lineage to other Metazoa. We find no support for the traditional view uniting the ctenophores and Cnidaria. Our findings are supported by Bayesian comparisons of topological hypotheses and we find no evidence that they are biased by long-branch attraction.ConclusionsOur study further clarifies relationships among early branching metazoan lineages. Our phylogeny supports the still-controversial position of ctenophores as sister group to all other metazoans. This study also provides a workflow and computational tools for minimizing systematic bias in genome-based phylogenetic analyses. Future studies of metazoan phylogeny will benefit from ongoing efforts to sequence the genomes of additional invertebrate taxa that will continue to inform our view of the relationships among the major lineages of animals.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-015-2146-4) contains supplementary material, which is available to authorized users.
Eusocial behavior has arisen in few animal groups, most notably in the aculeate Hymenoptera, a clade comprising ants, bees, and stinging wasps [1-4]. Phylogeny is crucial to understanding the evolution of the salient features of these insects, including eusociality [5]. Yet the phylogenetic relationships among the major lineages of aculeate Hymenoptera remain contentious [6-12]. We address this problem here by generating and analyzing genomic data for a representative series of taxa. We obtain a single well-resolved and strongly supported tree, robust to multiple methods of phylogenetic inference. Apoidea (spheciform wasps and bees) and ants are sister groups, a novel finding that contradicts earlier views that ants are closer to ectoparasitoid wasps. Vespid wasps (paper wasps, yellow jackets, and relatives) are sister to all other aculeates except chrysidoids. Thus, all eusocial species of Hymenoptera are contained within two major groups, characterized by transport of larval provisions and nest construction, likely prerequisites for the evolution of eusociality. These two lineages are interpolated among three other clades of wasps whose species are predominantly ectoparasitoids on concealed hosts, the inferred ancestral condition for aculeates [2]. This phylogeny provides a new framework for exploring the evolution of nesting, feeding, and social behavior within the stinging Hymenoptera.
In our recent paper on the phylogeny of aculeate Hymenoptera, we supplemented our primary data set with publicly available genome sequence data from four bee species, three ant species, and one wasp species. However, we did not give detailed citations for these data, nor did we make a clear distinction between those species whose genomes had been formally published and those for which data had been made publicly available prior to publication. The published genomes were those of the wasp Nasonia vitripennis [1], the honeybee Apis mellifera [2], and the ants Harpegnathos saltator [3], Pogonomyrmex barbatus [4], and Linepithema humile [5]. The prepublication genome data came from the bees Lasioglossum albipes (NCBI Sequence Read Archive SRR578269, as part of the Lasioglossum albipes WGS project, http://www.ncbi.nlm.nih.gov/ bioproject/174755), Megachile rotundata (NCBI Protein database search, with data coming mostly from the Megachile genome sequencing project, http://www.ncbi.nlm.nih.gov/bioproject/66515), and Bombus terrestris (protein set from NCBI RefSeq and Genome Annotation projects, derived from genomic sequence generated by the Bumble Bee Genome Project, https://www. hgsc.bcm.edu/arthropods/bumble-bee-genome-project).We apologize for any confusion created by the lack of explicit citations of these data sources.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.