Remipedes are a small and enigmatic group of crustaceans, first described only 30 years ago. Analyses of both morphological and molecular data have recently suggested a close relationship between Remipedia and Hexapoda. If true, the remipedes occupy an important position in pancrustacean evolution and may be pivotal for understanding the evolutionary history of crustaceans and hexapods. However, it is important to test this hypothesis using new data and new types of analytical approaches. Here, we assembled a phylogenomic data set of 131 taxa, incorporating newly generated 454 expressed sequence tag (EST) data from six species of crustaceans, representing five lineages (Remipedia, Laevicaudata, Spinicaudata, Ostracoda, and Malacostraca). This data set includes all crustacean species for which EST data are available (46 species), and our largest alignment encompasses 866,479 amino acid positions and 1,886 genes. A series of phylogenomic analyses was performed to evaluate pancrustacean relationships. We significantly improved the quality of our data for predicting putative orthologous genes and for generating data subsets by matrix reduction procedures, thereby improving the signal to noise ratio in the data. Eight different data sets were constructed, representing various combinations of orthologous genes, data subsets, and taxa. Our results demonstrate that the different ways to compile an initial data set of core orthologs and the selection of data subsets by matrix reduction can have marked effects on the reconstructed phylogenetic trees. Nonetheless, all eight data sets strongly support Pancrustacea with Remipedia as the sister group to Hexapoda. This is the first time that a sister group relationship of Remipedia and Hexapoda has been inferred using a comprehensive phylogenomic data set that is based on EST data. We also show that selecting data subsets with increased overall signal can help to identify and prevent artifacts in phylogenetic analyses.
Phylogenetic relationships of the primarily wingless insects are still considered unresolved. Even the most comprehensive phylogenomic studies that addressed this question did not yield congruent results. To get a grip on these problems, we here analyzed the sources of incongruence in these phylogenomic studies by using an extended transcriptome data set. Our analyses showed that unevenly distributed missing data can be severely misleading by inflating node support despite the absence of phylogenetic signal. In consequence, only decisive data sets should be used which exclusively comprise data blocks containing all taxa whose relationships are addressed. Additionally, we used Four-cluster Likelihood Mapping (FcLM) to measure the degree of congruence among genes of a data set, as a measure of support alternative to bootstrap. FcLM showed incongruent signal among genes, which in our case is correlated neither with functional class assignment of these genes nor with model misspecification due to unpartitioned analyses. The herein analyzed data set is the currently largest data set covering primarily wingless insects, but failed to elucidate their interordinal phylogenetic relationships. Although this is unsatisfying from a phylogenetic perspective, we try to show that the analyses of structure and signal within phylogenomic data can protect us from biased phylogenetic inferences due to analytical artifacts.
Background: Whenever different data sets arrive at conflicting phylogenetic hypotheses, only testable causal explanations of sources of errors in at least one of the data sets allow us to critically choose among the conflicting hypotheses of relationships. The large (28S) and small (18S) subunit rRNAs are among the most popular markers for studies of deep phylogenies. However, some nodes supported by this data are suspected of being artifacts caused by peculiarities of the evolution of these molecules. Arthropod phylogeny is an especially controversial subject dotted with conflicting hypotheses which are dependent on data set and method of reconstruction. We assume that phylogenetic analyses based on these genes can be improved further i) by enlarging the taxon sample and ii) employing more realistic models of sequence evolution incorporating nonstationary substitution processes and iii) considering covariation and pairing of sites in rRNA-genes.
The present analyses employ the almost complete sequence of the 28S rRNA gene to investigate phylogenetic relationships among Pancrustacea, placing special emphasis on the position of basal hexapod lineages. This study utilizes a greater number of characters and taxa of Protura, Collembola and Diplura than previous analyses to focus on conflicts in the reconstruction of the early steps in hexapod evolution. Phylogenetic trees are mainly based on Bayesian approaches, but likewise include analyses with Maximum Likelihood and Maximum Parsimony. Different analyses, including the application of a mixed DNA/RNA substitution model, were performed to narrow possible misleading effects of non-stationarity of nucleotide frequencies, saturation and character independence down to a minimum. This is the first time that a mixed DNA/RNA model is applied to analyse 28S rRNA sequences of basal hexapods. All methods yielded strong support for the monophyly of Collembola, Diplura, Dicondylia and Insecta s.str., as well as for a cluster composed of Diplura and Protura (‘Nonoculata-hypothesis’). However, the last cluster may be an artifact caused by a shared GC bias of the 28S sequences between these orders, in combination with a long branch effect. The instability of the position of the ‘Nonoculata’ within Pancrustacea further bears out the misleading effect of non-stationarity of nucleotide frequencies. Protura and Diplura either form the sister-group to Collembola (Entognatha) or cluster with branchiopod crustaceans. Overall, the phylogenetic signal of the complete sequences of the 28S rRNA gene favours monophyly of Hexapoda over paraphyly. However, further corroboration from independent data is needed to rule out the competing hypothesis of mutually paraphyletic Crustacea and Hexapoda
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.