This study describes and validates a new method for metagenomic biomarker discovery by way of class comparison, tests of biological consistency and effect size estimation. This addresses the challenge of finding organisms, genes, or pathways that consistently explain the differences between two or more microbial communities, which is a central problem to the study of metagenomics. We extensively validate our method on several microbiomes and a convenient online interface for the method is provided at http://huttenhower.sph.harvard.edu/lefse/.
Studies of the human microbiome have revealed that even healthy individuals differ remarkably in the microbes that occupy habitats such as the gut, skin, and vagina. Much of this diversity remains unexplained, although diet, environment, host genetics, and early microbial exposure have all been implicated. Accordingly, to characterize the ecology of human-associated microbial communities, the Human Microbiome Project has analyzed the largest cohort and set of distinct, clinically relevant body habitats to date. We found the diversity and abundance of each habitat’s signature microbes to vary widely even among healthy subjects, with strong niche specialization both within and among individuals. The project encountered an estimated 81–99% of the genera, enzyme families, and community configurations occupied by the healthy Western microbiome. Metagenomic carriage of metabolic pathways was stable among individuals despite variation in community structure, and ethnic/racial background proved to be one of the strongest associations of both pathways and microbes with clinical metadata. These results thus delineate the range of structural and functional configurations normal in the microbial communities of a healthy population, enabling future characterization of the epidemiology, ecology, and translational applications of the human microbiome.
To the Editor: MetaPhlAn (metagenomic phylogenetic analysis) 1 is a method for characterizing the taxonomic profiles of whole-metagenome shotgun (WMS) samples that has been used successfully in large-scale microbial community studies 2,3 . This work complements the original species-level profiling method with a system for eukaryotic and viral quantitation, strain-level identification and strain tracking. These and other extensions make the MetaPhlAn2 computational package (http://segatalab. cibio.unitn.it/tools/metaphlan2/ and Supplementary Software) an efficient tool for mining WMS samples.Our method infers the presence and read coverage of cladespecific markers to unequivocally detect the taxonomic clades present in a microbiome sample and estimate their relative abundance 1 . MetaPhlAn2 includes an expanded set of ~1 million markers (184 ± 45 for each bacterial species) from >7,500 species (Supplementary Tables 1-3), based on the approximately tenfold increase in the number of sequenced genomes in the past 2 years. Subspecies markers enable strain-level analyses, and quasi-markers improve accuracy and allow the detection of viruses and eukaryotic microbes (a full list of additions is provided in Supplementary Notes 1-3 and Supplementary Fig. 1).We validated MetaPhlAn2 using 24 synthetic metagenomes comprising 656 million reads and 1,295 species (Supplementary Note 4 and Supplementary Table 4). MetaPhlAn2 proved more accurate (average correlation: 0.95 ± 0.05) than mOTU 4 and Kraken 5 (0.80 ± 0.21 and 0.75 ± 0.22, respectively) ( Fig. 1a, Supplementary Figs. 2-9 and Supplementary Tables 5-11),with fewer false positives (an average of 10, compared with 22 and 23 for mOTU and Kraken, respectively) and false negatives (an average of 12, compared with 27 for the other two methods), even when including genomes that were absent from the reference database (Supplementary Note 4). With the adoption of the BowTie2 fast mapper and support for parallelism, MetaPhlAn2 is more than ten times faster than MetaPhlAn, and its speed is comparable to that of other tested approaches ( Supplementary Fig. 10).We applied MetaPhlAn2 to four elbow-skin samples that we sequenced from three subjects (Fig. 1b, Supplementary Note 5 and Supplementary Table 12). Our data showed that Propionibacterium acnes and Staphylococcus epidermidis dominated these sites, in agreement with expected genus-level results 6 , while providing species-level resolution. Together with these core species, we found Malassezia globosa in 93.65% of samples and confirmed it by coverage analysis (Supplementary Fig. 11). Although M. globosa is a known colonizer of the skin, its metagenomic characterization highlights the ability of MetaPhlAn2 to identify non-prokaryotic species. Phages (e.g., for Propionibacterium) and double-stranded DNA viruses of the Polyomavirus genus were also consistently detected. We subsequently profiled the whole set of 982 samples from other body sites from the Human Microbiome Project (HMP), including 219 samples sequenced after the initi...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.