To the Editor: MetaPhlAn (metagenomic phylogenetic analysis) 1 is a method for characterizing the taxonomic profiles of whole-metagenome shotgun (WMS) samples that has been used successfully in large-scale microbial community studies 2,3 . This work complements the original species-level profiling method with a system for eukaryotic and viral quantitation, strain-level identification and strain tracking. These and other extensions make the MetaPhlAn2 computational package (http://segatalab. cibio.unitn.it/tools/metaphlan2/ and Supplementary Software) an efficient tool for mining WMS samples.Our method infers the presence and read coverage of cladespecific markers to unequivocally detect the taxonomic clades present in a microbiome sample and estimate their relative abundance 1 . MetaPhlAn2 includes an expanded set of ~1 million markers (184 ± 45 for each bacterial species) from >7,500 species (Supplementary Tables 1-3), based on the approximately tenfold increase in the number of sequenced genomes in the past 2 years. Subspecies markers enable strain-level analyses, and quasi-markers improve accuracy and allow the detection of viruses and eukaryotic microbes (a full list of additions is provided in Supplementary Notes 1-3 and Supplementary Fig. 1).We validated MetaPhlAn2 using 24 synthetic metagenomes comprising 656 million reads and 1,295 species (Supplementary Note 4 and Supplementary Table 4). MetaPhlAn2 proved more accurate (average correlation: 0.95 ± 0.05) than mOTU 4 and Kraken 5 (0.80 ± 0.21 and 0.75 ± 0.22, respectively) ( Fig. 1a, Supplementary Figs. 2-9 and Supplementary Tables 5-11),with fewer false positives (an average of 10, compared with 22 and 23 for mOTU and Kraken, respectively) and false negatives (an average of 12, compared with 27 for the other two methods), even when including genomes that were absent from the reference database (Supplementary Note 4). With the adoption of the BowTie2 fast mapper and support for parallelism, MetaPhlAn2 is more than ten times faster than MetaPhlAn, and its speed is comparable to that of other tested approaches ( Supplementary Fig. 10).We applied MetaPhlAn2 to four elbow-skin samples that we sequenced from three subjects (Fig. 1b, Supplementary Note 5 and Supplementary Table 12). Our data showed that Propionibacterium acnes and Staphylococcus epidermidis dominated these sites, in agreement with expected genus-level results 6 , while providing species-level resolution. Together with these core species, we found Malassezia globosa in 93.65% of samples and confirmed it by coverage analysis (Supplementary Fig. 11). Although M. globosa is a known colonizer of the skin, its metagenomic characterization highlights the ability of MetaPhlAn2 to identify non-prokaryotic species. Phages (e.g., for Propionibacterium) and double-stranded DNA viruses of the Polyomavirus genus were also consistently detected. We subsequently profiled the whole set of 982 samples from other body sites from the Human Microbiome Project (HMP), including 219 samples sequenced after the initi...
The acquisition and development of the infant microbiome are key to establishing a healthy host-microbiome symbiosis. The maternal microbial reservoir is thought to play a crucial role in this process. However, the source and transmission routes of the infant pioneering microbes are poorly understood. To address this, we longitudinally sampled the microbiome of 25 mother-infant pairs across multiple body sites from birth up to 4 months postpartum. Strain-level metagenomic profiling showed a rapid influx of microbes at birth followed by strong selection during the first few days of life. Maternal skin and vaginal strains colonize only transiently, and the infant continues to acquire microbes from distinct maternal sources after birth. Maternal gut strains proved more persistent in the infant gut and ecologically better adapted than those acquired from other sources. Together, these data describe the mother-to-infant microbiome transmission routes that are integral in the development of the infant microbiome.
Among the human health conditions linked to microbial communities, phenotypes are often associated with only a subset of strains within causal microbial groups. Although it has been critical for decades in microbial physiology to characterize individual strains, this has been challenging when using culture-independent high-throughput metagenomics. We introduce StrainPhlAn, a novel metagenomic strain identification approach, and apply it to characterize the genetic structure of thousands of strains from more than 125 species in more than 1500 gut metagenomes drawn from populations spanning North and South American, European, Asian, and African countries. The method relies on per-sample dominant sequence variant reconstruction within species-specific marker genes. It identified primarily subject-specific strain variants (<5% inter-subject strain sharing), and we determined that a single strain typically dominated each species and was retained over time (for >70% of species). Microbial population structure was correlated in several distinct ways with the geographic structure of the host population. In some cases, discrete subspecies (e.g., for Eubacterium rectale and Prevotella copri) or continuous microbial genetic variations (e.g., for Faecalibacterium prausnitzii) were associated with geographically distinct human populations, whereas few strains occurred in multiple unrelated cohorts. We further estimated the genetic variability of gut microbes, with Bacteroides species appearing remarkably consistent (0.45% median number of nucleotide variants between strains), whereas P. copri was among the most plastic gut colonizers. We thus characterize here the population genetics of previously inaccessible intestinal microbes, providing a comprehensive strain-level genetic overview of the gut microbial diversity.
Shotgun metagenomic analysis of the human associated microbiome provides a rich set of microbial features for prediction and biomarker discovery in the context of human diseases and health conditions. However, the use of such high-resolution microbial features presents new challenges, and validated computational tools for learning tasks are lacking. Moreover, classification rules have scarcely been validated in independent studies, posing questions about the generality and generalization of disease-predictive models across cohorts. In this paper, we comprehensively assess approaches to metagenomics-based prediction tasks and for quantitative assessment of the strength of potential microbiome-phenotype associations. We develop a computational framework for prediction tasks using quantitative microbiome profiles, including species-level relative abundances and presence of strain-specific markers. A comprehensive meta-analysis, with particular emphasis on generalization across cohorts, was performed in a collection of 2424 publicly available metagenomic samples from eight large-scale studies. Cross-validation revealed good disease-prediction capabilities, which were in general improved by feature selection and use of strain-specific markers instead of species-level taxonomic abundance. In cross-study analysis, models transferred between studies were in some cases less accurate than models tested by within-study cross-validation. Interestingly, the addition of healthy (control) samples from other studies to training sets improved disease prediction capabilities. Some microbial species (most notably Streptococcus anginosus) seem to characterize general dysbiotic states of the microbiome rather than connections with a specific disease. Our results in modelling features of the “healthy” microbiome can be considered a first step toward defining general microbial dysbiosis. The software framework, microbiome profiles, and metadata for thousands of samples are publicly available at http://segatalab.cibio.unitn.it/tools/metaml.
Identifying microbial strains and characterizing their functional potential is essential for pathogen discovery, epidemiology and population genomics. We present pangenome-based phylogenomic analysis (PanPhlAn; http://segatalab.cibio.unitn.it/tools/panphlan), a tool that uses metagenomic data to achieve strain-level microbial profiling resolution. PanPhlAn recognized outbreak strains, produced the largest strain-level population genomic study of human-associated bacteria and, in combination with metatranscriptomics, profiled the transcriptional activity of strains in complex communities.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.