2018
DOI: 10.1021/acs.jproteome.8b00722
|View full text |Cite
|
Sign up to set email alerts
|

ComPIL 2.0: An Updated Comprehensive Metaproteomics Database

Abstract: We designed a metaproteomic analysis method (ComPIL) to accommodate the ever-increasing number of sequences against which experimental shotgun proteomics spectra could be accurately and rapidly queried. Our objective was to create these large databases for the analysis of complex metasamples with unknown composition, including those derived from human, animal, and environmental microbiomes. The amount of high-throughput sequencing data has substantially increased since our original database was assembled in 20… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
24
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
2
2

Relationship

2
6

Authors

Journals

citations
Cited by 23 publications
(24 citation statements)
references
References 23 publications
0
24
0
Order By: Relevance
“…We collected a total of 2,829,920 MS2 spectra between all 18 patient samples. These spectra were searched against the ComPIL 2.0 database (contains 4.8 billion unique tryptic peptides from >225 million forward and reverse protein sequences) (23, 24) using the ProLuCID/SEQUEST search engine (20, 46, 47). 503,244 (17.8%) MS2 spectra were mapped to 54,378 distinct peptides at a 1% peptide false discovery rate (2 peptide per protein minimum) using a target-decoy strategy (86).…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…We collected a total of 2,829,920 MS2 spectra between all 18 patient samples. These spectra were searched against the ComPIL 2.0 database (contains 4.8 billion unique tryptic peptides from >225 million forward and reverse protein sequences) (23, 24) using the ProLuCID/SEQUEST search engine (20, 46, 47). 503,244 (17.8%) MS2 spectra were mapped to 54,378 distinct peptides at a 1% peptide false discovery rate (2 peptide per protein minimum) using a target-decoy strategy (86).…”
Section: Methodsmentioning
confidence: 99%
“…To address this problem, we developed the Comprehensive Protein Identification Library (ComPIL), a large and scalable proteomics database generally intended for metaproteomics studies (23). In its current iteration (ComPIL 2.0), it houses >4.8 billion unique, tryptic peptides derived from >113 million bacterial, archeal, viral, and eukaryotic parent protein sequences assembled from public sequencing repositories (24) Relative to metagenomics, LC-MS/MS-based metaproteomics are less commonly applied and more rarely employed in IBD studies. In fact, the first large-scale endeavor to identify proteins from a microbial biofilm community was only disclosed by Banfield, et al in 2005 (27-29).…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Second, the protein inference problem 19 is more pronounced in metaproteomics due to many homologous proteins from closely related organisms 20 . As a result, several dedicated bioinformatic tools have been developed or extended for metaproteomic analysis [21][22][23][24][25][26][27][28] . Despite these challenges, the added value of metaproteomics has already been demonstrated in numerous examples from both the environmental and medical fields, providing unprecedented insights into the functional activity of microbial communities 7,20,[29][30][31][32][33][34][35][36][37][38][39][40][41] .…”
Section: Mainmentioning
confidence: 99%
“…In this study, we present a metaproteomic bioinformatics workflow ( Figure 1 ) that uses MS-based data from COVID-19 patients as an input to detect peptides associated with coinfecting organisms. MS files were searched using ComPIL 2.0 11 against a comprehensive protein sequence database and the detected peptides were used to find taxonomic information 12 about microorganisms present in the sample. Based on the taxonomic information, the mass spectrometry data was reinterrogated using a metaproteomics workflow ( Figure 1 ) within the Galaxy platform to (a) match tandem mass spectra (MS/MS) against a focused custom protein sequence database of clinically significant taxa; and (b) verify detected peptides for their peptide-spectrum match (PSM) quality using the PepQuery software tool 13 and the Lorikeet tandem mass spectrometry (MS/MS) spectral visualization tool.…”
mentioning
confidence: 99%