Background
Soil and sediment microorganisms are highly phylogenetically diverse but are currently largely under-represented in public molecular databases. Their functional characterization by means of metaproteomics is usually performed using metagenomic sequences acquired for the same sample. However, such hugely diverse metagenomic datasets are difficult to assemble; in parallel, theoretical proteomes from isolates available in generic databases are of high quality. Both these factors advocate for the use of theoretical proteomes in metaproteomics interpretation pipelines. Here, we examined a number of database construction strategies with a view to increasing the outputs of metaproteomics studies performed on soil samples.
Results
The number of peptide-spectrum matches was found to be of comparable magnitude when using public or sample-specific metagenomics-derived databases. However, numbers were significantly increased when a combination of both types of information was used in a two-step cascaded search. Our data also indicate that the functional annotation of the metaproteomics dataset can be maximized by using a combination of both types of databases.
Conclusions
A two-step strategy combining sample-specific metagenome database and public databases such as the non-redundant NCBI database and a massive soil gene catalog allows maximizing the metaproteomic interpretation both in terms of ratio of assigned spectra and retrieval of function-derived information.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.