BackgroundElucidating the role of gut microbiota in physiological and pathological processes has recently emerged as a key research aim in life sciences. In this respect, metaproteomics, the study of the whole protein complement of a microbial community, can provide a unique contribution by revealing which functions are actually being expressed by specific microbial taxa. However, its wide application to gut microbiota research has been hindered by challenges in data analysis, especially related to the choice of the proper sequence databases for protein identification.ResultsHere, we present a systematic investigation of variables concerning database construction and annotation and evaluate their impact on human and mouse gut metaproteomic results. We found that both publicly available and experimental metagenomic databases lead to the identification of unique peptide assortments, suggesting parallel database searches as a mean to gain more complete information. In particular, the contribution of experimental metagenomic databases was revealed to be mandatory when dealing with mouse samples. Moreover, the use of a “merged” database, containing all metagenomic sequences from the population under study, was found to be generally preferable over the use of sample-matched databases. We also observed that taxonomic and functional results are strongly database-dependent, in particular when analyzing the mouse gut microbiota. As a striking example, the Firmicutes/Bacteroidetes ratio varied up to tenfold depending on the database used. Finally, assembling reads into longer contigs provided significant advantages in terms of functional annotation yields.ConclusionsThis study contributes to identify host- and database-specific biases which need to be taken into account in a metaproteomic experiment, providing meaningful insights on how to design gut microbiota studies and to perform metaproteomic data analysis. In particular, the use of multiple databases and annotation tools has to be encouraged, even though this requires appropriate bioinformatic resources.Electronic supplementary materialThe online version of this article (doi:10.1186/s40168-016-0196-8) contains supplementary material, which is available to authorized users.
Metaproteomics enables the investigation of the protein repertoire expressed by complex microbial communities. However, to unleash its full potential, refinements in bioinformatic approaches for data analysis are still needed. In this context, sequence databases selection represents a major challenge.This work assessed the impact of different databases in metaproteomic investigations by using a mock microbial mixture including nine diverse bacterial and eukaryotic species, which was subjected to shotgun metaproteomic analysis. Then, both the microbial mixture and the single microorganisms were subjected to next generation sequencing to obtain experimental metagenomic- and genomic-derived databases, which were used along with public databases (namely, NCBI, UniProtKB/SwissProt and UniProtKB/TrEMBL, parsed at different taxonomic levels) to analyze the metaproteomic dataset. First, a quantitative comparison in terms of number and overlap of peptide identifications was carried out among all databases. As a result, only 35% of peptides were common to all database classes; moreover, genus/species-specific databases provided up to 17% more identifications compared to databases with generic taxonomy, while the metagenomic database enabled a slight increment in respect to public databases. Then, database behavior in terms of false discovery rate and peptide degeneracy was critically evaluated. Public databases with generic taxonomy exhibited a markedly different trend compared to the counterparts. Finally, the reliability of taxonomic attribution according to the lowest common ancestor approach (using MEGAN and Unipept software) was assessed. The level of misassignments varied among the different databases, and specific thresholds based on the number of taxon-specific peptides were established to minimize false positives. This study confirms that database selection has a significant impact in metaproteomics, and provides critical indications for improving depth and reliability of metaproteomic results. Specifically, the use of iterative searches and of suitable filters for taxonomic assignments is proposed with the aim of increasing coverage and trustworthiness of metaproteomic data.
BackgroundThe study of the gut microbiota (GM) is rapidly moving towards its functional characterization by means of shotgun meta-omics. In this context, there is still no consensus on which microbial functions are consistently and constitutively expressed in the human gut in physiological conditions. Here, we selected a cohort of 15 healthy subjects from a native and highly monitored Sardinian population and analyzed their GMs using shotgun metaproteomics, with the aim of investigating GM functions actually expressed in a healthy human population. In addition, shotgun metagenomics was employed to reveal GM functional potential and to compare metagenome and metaproteome profiles in a combined taxonomic and functional fashion.ResultsMetagenomic and metaproteomic data concerning the taxonomic structure of the GM under study were globally comparable. On the contrary, a considerable divergence between genetic potential and functional activity of the human healthy GM was observed, with the metaproteome displaying a higher plasticity, compared to the lower inter-individual variability of metagenome profiles. The taxon-specific contribution to functional activities and metabolic tasks was also examined, giving insights into the peculiar role of several GM members in carbohydrate metabolism (including polysaccharide degradation, glycan transport, glycolysis, and short-chain fatty acid production). Noteworthy, Firmicutes-driven butyrogenesis (mainly due to Faecalibacterium spp.) was shown to be the metabolic activity with the highest expression rate and the lowest inter-individual variability in the study cohort, in line with the previously reported importance of the biosynthesis of this microbial product for the gut homeostasis.ConclusionsOur results provide detailed and taxon-specific information regarding functions and pathways actively working in a healthy GM. The reported discrepancy between expressed functions and functional potential suggests that caution should be used before drawing functional conclusions from metagenomic data, further supporting metaproteomics as a fundamental approach to characterize the human GM metabolic functions and activities.Electronic supplementary materialThe online version of this article (doi:10.1186/s40168-017-0293-3) contains supplementary material, which is available to authorized users.
BackgroundThe massive characterization of host-associated and environmental microbial communities has represented a real breakthrough in the life sciences in the last years. In this context, metaproteomics specifically enables the transition from assessing the genomic potential to actually measuring the functional expression of a microbiome. However, significant research efforts are still required to develop analysis pipelines optimized for metaproteome characterization.ResultsThis work presents an efficient analytical pipeline for shotgun metaproteomic analysis, combining bead-beating/freeze-thawing for protein extraction, filter-aided sample preparation for cleanup and digestion, and single-run liquid chromatography-tandem mass spectrometry for peptide separation and identification. The overall procedure is more time-effective and less labor-intensive when compared to state-of-the-art metaproteomic techniques. The pipeline was first evaluated using mock microbial mixtures containing different types of bacteria and yeasts, enabling the identification of up to over 15,000 non-redundant peptide sequences per run with a linear dynamic range from 104 to 108 colony-forming units. The pipeline was then applied to the mouse fecal metaproteome, leading to the overall identification of over 13,000 non-redundant microbial peptides with a false discovery rate of <1%, belonging to over 600 different microbial species and 250 functionally relevant protein families. An extensive mapping of the main microbial metabolic pathways actively functioning in the gut microbiome was also achieved.ConclusionsThe analytical pipeline presented here may be successfully used for the in-depth and time-effective characterization of complex microbial communities, such as the gut microbiome, and represents a useful tool for the microbiome research community.Electronic supplementary materialThe online version of this article (doi:10.1186/s40168-014-0049-2) contains supplementary material, which is available to authorized users.
Genetic isolates represent exceptional resources for the mapping of complex traits but not all isolates are similar. We have selected a genetic and cultural isolate, the village of Talana from an isolated area of Sardinia, and propose that this population is suitable for the mapping of complex traits. A wealth of historical and archive data allowed the reconstruction of the demographic and genealogical history of the village. Key features of the population, which has grown slowly with no significant immigration, were defined by using a combination of historical, demographic and genetic studies. The genealogy of each Talana inhabitant was reconstructed and the main maternal and paternal lineages of the village were defined. Haplotype and phylogenetic analyses of the Y chromosome and characterisation of mitochondrial DNA haplogroups were used to determine the number of ancestral village founders. The extent of linkage disequilibrium (LD) was evaluated by the analysis of several microsatellites in chromosomal region Xq13.3, which was previously used to asses the extension of LD. Genealogical reconstructions were confirmed and reinforced by the genetic analyses, since some lineages were found to have merged prior to the beginning of the archival records, suggesting an even smaller number of founders than initially predicted. About 80% of the present-day population appears to derive from eight paternal and eleven maternal ancestral lineages. LD was found to span, on average, a 5-Mb region in Xq13.3. This suggests the possibility of identifying identical-by-descent regions associated with complex traits in a genome-wide search by using a low-density marker map. The present study emphasises the importance of combining genetic studies with genealogical and historical information.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.