To tackle incongruence, the topological conflict between different gene trees, phylogenomic studies couple concatenation with practices such as rogue taxon removal or the use of slowly evolving genes. Phylogenomic analysis of 1,070 orthologues from 23 yeast genomes identified 1,070 distinct gene trees, which were all incongruent with the phylogeny inferred from concatenation. Incongruence severity increased for shorter internodes located deeper in the phylogeny. Notably, whereas most practices had little or negative impact on the yeast phylogeny, the use of genes or internodes with high average internode support significantly improved the robustness of inference. We obtained similar results in analyses of vertebrate and metazoan phylogenomic data sets. These results question the exclusive reliance on concatenation and associated practices, and argue that selecting genes with strong phylogenetic signals and demonstrating the absence of significant incongruence are essential for accurately reconstructing ancient divergences.
Phylogenies inferred from different data matrices often conflict with each other necessitating the development of measures that quantify this incongruence. Here, we introduce novel measures that use information theory to quantify the degree of conflict or incongruence among all nontrivial bipartitions present in a set of trees. The first measure, internode certainty (IC), calculates the degree of certainty for a given internode by considering the frequency of the bipartition defined by the internode (internal branch) in a given set of trees jointly with that of the most prevalent conflicting bipartition in the same tree set. The second measure, IC All (ICA), calculates the degree of certainty for a given internode by considering the frequency of the bipartition defined by the internode in a given set of trees in conjunction with that of all conflicting bipartitions in the same underlying tree set. Finally, the tree certainty (TC) and TC All (TCA) measures are the sum of IC and ICA values across all internodes of a phylogeny, respectively. IC, ICA, TC, and TCA can be calculated from different types of data that contain nontrivial bipartitions, including from bootstrap replicate trees to gene trees or individual characters. Given a set of phylogenetic trees, the IC and ICA values of a given internode reflect its specific degree of incongruence, and the TC and TCA values describe the global degree of incongruence between trees in the set. All four measures are implemented and freely available in version 8.0.0 and subsequent versions of the widely used program RAxML.
Summary The domestication of animals, plants and microbes fundamentally transformed the lifestyle and demography of the human species [1]. Although the genetic and functional underpinnings of animal and plant domestication are well understood, little is known about microbe domestication [2–6]. We systematically examined genome-wide sequence and functional variation between the domesticated fungus Aspergillus oryzae, whose saccharification abilities humans have harnessed for thousands of years to produce sake, soy sauce and miso from starch-rich grains, and its wild relative A. flavus, a potentially toxigenic plant and animal pathogen [7]. We discovered dramatic changes in the sequence variation and abundance profiles of genes and wholesale primary and secondary metabolic pathways between domesticated and wild relative isolates during growth on rice. Through selection by humans, our data suggest that an atoxigenic lineage of A. flavus gradually evolved into a “cell factory” for enzymes and metabolites involved in the saccharification process. These results suggest that whereas animal and plant domestication was largely driven by Neolithic “genetic tinkering” of developmental pathways, microbe domestication was driven by extensive remodeling of metabolism.
Bacteriophage flux can cause the majority of genetic diversity in free-living bacteria. This tenet of bacterial genome evolution generally does not extend to obligate intracellular bacteria owing to their reduced contact with other microbes and a predominance of gene deletion over gene transfer. However, recent studies suggest intracellular coinfections in the same host can facilitate exchange of mobile elements between obligate intracellular bacteria—a means by which these bacteria can partially mitigate the reductive forces of the intracellular lifestyle. To test whether bacteriophages transfer as single genes or larger regions between coinfections, we sequenced the genome of the obligate intracellular Wolbachia strain wVitB from the parasitic wasp Nasonia vitripennis and compared it against the prophage sequences of the divergent wVitA coinfection. We applied, for the first time, a targeted sequence capture array to specifically trap the symbiont's DNA from a heterogeneous mixture of eukaryotic, bacterial, and viral DNA. The tiled array successfully captured the genome with 98.3% efficiency. Examination of the genome sequence revealed the largest transfer of bacteriophage and flanking genes (52.2 kb) to date between two obligate intracellular coinfections. The mobile element transfer occurred in the recent evolutionary past based on the 99.9% average nucleotide identity of the phage sequences between the two strains. In addition to discovering an evolutionary recent and large-scale horizontal phage transfer between coinfecting obligate intracellular bacteria, we demonstrate that “targeted genome capture” can enrich target DNA to alleviate the problem of isolating symbiotic microbes that are difficult to culture or purify from the conglomerate of organisms inside eukaryotes.
Molecular phylogenetic inference is inherently dependent on choices in both methodology and data. Many insightful studies have shown how choices in methodology, such as the model of sequence evolution or optimality criterion used, can strongly influence inference. In contrast, much less is known about the impact of choices in the properties of the data, typically genes, on phylogenetic inference. We investigated the relationships between 52 gene properties (24 sequence-based, 19 function-based, and 9 tree-based) with each other and with three measures of phylogenetic signal in two assembled data sets of 2,832 yeast and 2,002 mammalian genes. We found that most gene properties, such as evolutionary rate (measured through the percent average of pairwise identity across taxa) and total tree length, were highly correlated with each other. Similarly, several gene properties, such as gene alignment length, Guanine-Cytosine content, and the proportion of tree distance on internal branches divided by relative composition variability (treeness/RCV), were strongly correlated with phylogenetic signal. Analysis of partial correlations between gene properties and phylogenetic signal in which gene evolutionary rate and alignment length were simultaneously controlled, showed similar patterns of correlations, albeit weaker in strength. Examination of the relative importance of each gene property on phylogenetic signal identified gene alignment length, alongside with number of parsimony-informative sites and variable sites, as the most important predictors. Interestingly, the subsets of gene properties that optimally predicted phylogenetic signal differed considerably across our three phylogenetic measures and two data sets; however, gene alignment length and RCV were consistently included as predictors of all three phylogenetic measures in both yeasts and mammals. These results suggest that a handful of sequence-based gene properties are reliable predictors of phylogenetic signal and could be useful in guiding the choice of phylogenetic markers.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.