2020
DOI: 10.1128/msystems.00833-20
|View full text |Cite
|
Sign up to set email alerts
|

Lost and Found: Re-searching and Re-scoring Proteomics Data Aids Genome Annotation and Improves Proteome Coverage

Abstract: Prokaryotic genome annotation is heavily dependent on automated gene annotation pipelines that are prone to propagate errors and underestimate genome complexity. We describe an optimized proteogenomic workflow that uses ribosome profiling (ribo-seq) and proteomic data for Salmonella enterica serovar Typhimurium to identify unannotated proteins or alternative protein forms. This data analysis encompasses the searching of cofragmenting peptides and postprocessing with extended peptide-to-spectrum quality feature… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
38
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 18 publications
(38 citation statements)
references
References 59 publications
0
38
0
Order By: Relevance
“…In fact, it was observed that, despite effective SEP enrichment the quantitative aspect of the proteomic study was by and large unaffected. This SEP enrichment method might hold the potential to identify a pool of newly discovered and yet uncharacterized SEPs identified in riboproteogenomics efforts (Willems et al, 2020). As demonstrated before (Garai and Blanc-Potard, 2020), small proteins frequently display irregular amino acid compositions and thus different buffer conditions and (MS-) methodologies might be required in order to extract and identify this elusive class of proteins.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…In fact, it was observed that, despite effective SEP enrichment the quantitative aspect of the proteomic study was by and large unaffected. This SEP enrichment method might hold the potential to identify a pool of newly discovered and yet uncharacterized SEPs identified in riboproteogenomics efforts (Willems et al, 2020). As demonstrated before (Garai and Blanc-Potard, 2020), small proteins frequently display irregular amino acid compositions and thus different buffer conditions and (MS-) methodologies might be required in order to extract and identify this elusive class of proteins.…”
Section: Discussionmentioning
confidence: 99%
“…Recently, a first large-scale machine learningaided sequence analysis effort of prokaryotic genomes provided 109 putative small ORFome predictions across the bacterial phylogeny (Miravet-Verde et al, 2019). We and others have also shown that the use of non-redundant tryptic peptide databases based on bacterial genome sequences translated in all six frames, in conjunction with state-of-the-art high resolution MS only increases the search space modestly (∼4-fold increase in case of S. typhimurium in contrast to the over 400-fold inflation for the human genome (Zhu et al, 2018) and can thus conveniently be implemented, also for the reliable proteogenomic identification of SEPs, when combined with modern, robust post-processing tools like Percolator (Käll et al, 2007;Omasits et al, 2017;Willems et al, 2020).…”
Section: Introductionmentioning
confidence: 99%
“…To our knowledge, the longest antisense OLG with proteomic evidence is a 1644 nt ORF (encoding for 548 AA), located in frame -1 in Deinococcus radiodurans 88 olg1 and olg2 are phylogenetically young genes under selection. For both olg1 and olg2, the OLG sequence is evolving considerably faster at the AA level than the mother gene protein sequence (approximately two and 12 times faster respectively; Supplementary Table 8).…”
Section: Discussionmentioning
confidence: 99%
“…Nevertheless, protein evidence of antisense OLGs was provided in some proteomic studies 87 , but mainly attributed to a high false positive rate. Proteomic OLG evidence was found for other bacterial genera, including Helicobacter 65 , Salmonella 88 or Pseudomonas 48,86 . In P. putida, 44 small antisense-encoded proteins were found using MS 48 .…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation