Attempts were made to define the relationship among the three domains (eukaryotes, archaea, and eubacteria) using phylogenetic tree analyses of 16S rRNA sequences as well as of other protein sequences. Since the results are inconsistent, it is implied that the eukaryotic genome has a chimeric structure. In our previous studies, the origin of eukaryotes to be the symbiosis of archaea into eubacteria using the whole open reading frames (ORF) of many genomes was suggested. In these studies, the species participating in the symbiosis were not clarified, and the effect of gene duplication after speciation (in-paralog) was not addressed. To avoid the influence of the in-paralog, we developed a new method to calculate orthologous ORFs. Furthermore, we separated eukaryotic in-paralogs into three groups by sequence similarity to archaea, eubacteria (other than alpha-proteobacteria), and alpha-proteobacteria and treated them as individual organisms. The relationship between the three ORF groups and the functional classification was clarified by this analysis. The introduction of this new method into the phylogenetic tree analysis of 66 organisms (4 eukaryotes, 13 archaea, and 49 eubacteria) based on gene content suggests the symbiosis of pyrococcus into gamma-proteobacteria as the origin of eukaryotes.
Here, we constructed a phylogenetic tree of 17 bacterial phyla covering eubacteria and archaea by using a new method and 102 carefully selected orthologs from their genomes. One of the serious disturbing factors in phylogeny construction is the existence of out-paralogs that cannot easily be found out and discarded. In our method, out-paralogs are detected and removed by constructing a phylogenetic tree of the genes in question and examining the clustered genes in the tree. We also developed a method for comparing two tree topologies or shapes, ComTree. Applying ComTree to the constructed tree we computed the relative number of orthologs that support a node of the tree. This number is called the Positive Ortholog Ratio (POR), which is conceptually and methodologically different from the frequently used bootstrap value. Our study concretely shows drawbacks of the bootstrap test. Our result of bacterial phylogeny analysis is consistent with previous ones showing that hyperthermophilic bacteria such as Thermotogae and Aquificae diverged earlier than the others in the eubacterial phylogeny studied. It is noted that our results are consistent whether thermophilic archaea or mesophilic archaea is employed for determining the root of the tree. The earliest divergence of hyperthermophilic eubacteria is supported by genes involved in fundamental metabolic processes such as glycolysis, nucleotide and amino acid syntheses.
Orthologs are widely used for phylogenetic analysis of species; however, identifying genuine orthologs among distantly related species is challenging, because genes obtained through horizontal gene transfer (HGT) and out-paralogs derived from gene duplication before speciation are often present among the predicted orthologs. We developed a program, “Ortholog-Finder,” to obtain ortholog data sets for performing phylogenetic analysis by using all open-reading frame data of species. The program includes five processes for minimizing the effects of HGT and out-paralogs in phylogeny construction: 1) HGT filtering: Genes derived from HGT could be detected and deleted from the initial sequence data set by examining their base compositions. 2) Out-paralog filtering: Out-paralogs are detected and deleted from the data set based on sequence similarity. 3) Classification of phylogenetic trees: Phylogenetic trees generated for ortholog candidates are classified as monophyletic or polyphyletic trees. 4) Tree splitting: Polyphyletic trees are bisected to obtain monophyletic trees and remove HGT genes and out-paralogs. 5) Threshold changing: Out-paralogs are further excluded from the data set based on the difference in the similarity scores of genuine orthologs and out-paralogs. We examined how out-paralogs and HGTs affected phylogenetic trees constructed for species based on ortholog data sets obtained by Ortholog-Finder with the use of simulation data, and we determined the effects of confounding factors. We then used Ortholog-Finder in phylogeny construction for 12 Gram-positive bacteria from two phyla and validated each node of the constructed tree by comparison with individually constructed ortholog trees.
There is currently no consensus on the evolutionary origin of eukaryotes. In the search of the ancestors of eukaryotes, we analyzed the phylogeny of 46 genomes, including those of 2 eukaryotes, 8 archaea, and 36 eubacteria. To avoid the effects of gene duplications, we used inparalog pairs of genes with orthologous relationships. First, we grouped these inparalogs into the functional categories of the nucleus, cytoplasm, and mitochondria. Next, we counted the sister groups of eukaryotes in prokaryotic phyla and plotted them on a standard phylogenetic tree. Finally, we used Pearson's chi-square test to estimate the origin of the genomes from specific prokaryotic ancestors. The results suggest the eukaryotic nuclear genome descends from an archaea that was neither euryarchaeota nor crenarchaeota and that the mitochondrial genome descends from α-proteobacteria. In contrast, genes related to the cytoplasm do not appear to originate from a specific group of prokaryotes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.