Whole-genome sequencing projects are increasingly populating the tree of life and characterizing biodiversity1–4. Sparse taxon sampling has previously been proposed to confound phylogenetic inference5, and captures only a fraction of the genomic diversity. Here we report a substantial step towards the dense representation of avian phylogenetic and molecular diversity, by analysing 363 genomes from 92.4% of bird families—including 267 newly sequenced genomes produced for phase II of the Bird 10,000 Genomes (B10K) Project. We use this comparative genome dataset in combination with a pipeline that leverages a reference-free whole-genome alignment to identify orthologous regions in greater numbers than has previously been possible and to recognize genomic novelties in particular bird lineages. The densely sampled alignment provides a single-base-pair map of selection, has more than doubled the fraction of bases that are confidently predicted to be under conservation and reveals extensive patterns of weak selection in predominantly non-coding DNA. Our results demonstrate that increasing the diversity of genomes used in comparative studies can reveal more shared and lineage-specific variation, and improve the investigation of genomic characteristics. We anticipate that this genomic resource will offer new perspectives on evolutionary processes in cross-species comparative analyses and assist in efforts to conserve species.
Background Metagenomic sequencing is a well-established tool in the modern biosciences. While it promises unparalleled insights into the genetic content of the biological samples studied, conclusions drawn are at risk from biases inherent to the DNA sequencing methods, including inaccurate abundance estimates as a function of genomic guanine-cytosine (GC) contents. Results We explored such GC biases across many commonly used platforms in experiments sequencing multiple genomes (with mean GC contents ranging from 28.9% to 62.4%) and metagenomes. GC bias profiles varied among different library preparation protocols and sequencing platforms. We found that our workflows using MiSeq and NextSeq were hindered by major GC biases, with problems becoming increasingly severe outside the 45–65% GC range, leading to a falsely low coverage in GC-rich and especially GC-poor sequences, where genomic windows with 30% GC content had >10-fold less coverage than windows close to 50% GC content. We also showed that GC content correlates tightly with coverage biases. The PacBio and HiSeq platforms also evidenced similar profiles of GC biases to each other, which were distinct from those seen in the MiSeq and NextSeq workflows. The Oxford Nanopore workflow was not afflicted by GC bias. Conclusions These findings indicate potential sources of difficulty, arising from GC biases, in genome sequencing that could be pre-emptively addressed with methodological optimizations provided that the GC biases inherent to the relevant workflow are understood. Furthermore, it is recommended that a more critical approach be taken in quantitative abundance estimates in metagenomic studies. In the future, metagenomic studies should take steps to account for the effects of GC bias before drawing conclusions, or they should use a demonstrably unbiased workflow.
Salmonids are important sources of protein for a large proportion of the human population. Interaction between the gut microbiota and host has been shown to affect the host phenotype in mammals, but relatively little is known about microbiota-host interaction in fish. Mycoplasma species are a major constituent of the gut microbiota of salmonids, often representing the majority of microbial cells. Despite the frequent reported dominance of intestinal Mycoplasma species, very little is known about their phylogenetic placement, functions and potential evolutionary relationships with their salmonid hosts.In this study, we utilise 2.9 billion metagenomic reads generated from 12 samples from three different salmonid host species to I) characterise and curate the first metagenome-assembled genomes (MAGs) of Mycoplasma dominating the intestines of three different salmonid species, II) establish the phylogeny of these salmonid candidate Mycoplasma species using known Mycoplasma genomes, III) perform a comprehensive pangenomic analysis of Mycoplasma, IV) decipher the putative functionalities of the salmonid MAGs and reveal specific functions expected to benefit the host.Our data provide a basis for future studies examining the composition and function of the salmonid microbiota, with a potential for being further exploited in order to increase animal health and growth in aquaculture.
Salmonids are important sources of protein for a large proportion of the human population. Mycoplasma species are a major constituent of the gut microbiota of salmonids, often representing the majority of microbiota. Despite the frequent reported dominance of salmonid-related Mycoplasma species, little is known about the phylogenomic placement, functions and potential evolutionary relationships with their salmonid hosts. In this study, we utilise 2.9 billion metagenomic reads generated from 12 samples from three different salmonid host species to I) characterise and curate the first metagenome-assembled genomes (MAGs) of Mycoplasma dominating the intestines of three different salmonid species, II) establish the phylogeny of these salmonid candidate Mycoplasma species, III) perform a comprehensive pangenomic analysis of Mycoplasma, IV) decipher the putative functionalities of the salmonid MAGs and reveal specific functions expected to benefit the host. Our data provide a basis for future studies examining the composition and function of the salmonid microbiota.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.