2021
DOI: 10.21203/rs.3.rs-691927/v1
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

ionbot: a novel, innovative and sensitive machine learning approach to LC-MS/MS peptide identification

Abstract: Mass spectrometry-based proteomics generates vast amounts of signal data that require computational interpretation to obtain peptide identifications. Dozens of algorithms for this task exist, but all exploit only part of the acquired data to judge a peptide-to-spectrum match (PSM), ignoring important information such as the observed retention time and fragment ion peak intensity pattern. Moreover, only few identification algorithms allow open modification searches that can substantially increase peptide identi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2

Relationship

2
3

Authors

Journals

citations
Cited by 6 publications
(9 citation statements)
references
References 36 publications
0
9
0
Order By: Relevance
“…Reprocessing the raw data with ionbot 16 using uniform search settings enabled straightforward reanalysis and comparison. However, this approach comes with two main pitfalls.…”
Section: ■ Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Reprocessing the raw data with ionbot 16 using uniform search settings enabled straightforward reanalysis and comparison. However, this approach comes with two main pitfalls.…”
Section: ■ Discussionmentioning
confidence: 99%
“…These selected projects consisted of 15,146 raw files in total, which were locally reprocessed using ionbot (version 0.6.2) in an open modification search against a protein database containing 75,141 proteins from both Swiss-Prot and TrEMBL (September 2020) as well as common contaminants with Scarbamidomethylation of cysteine and oxidation of methionine as variable modifications. 16 The raw files were manually annotated for their corresponding tissue and cell type of origin by either the metadata present in PRIDE, or by manual curation of the publication linked to the project. If no unambiguous annotation could be given at the cell type level, the annotation was equivalent to that at the tissue level, e.g., if it was not clear if it was a CD4 T-cell or a CD8 T-cell, it was categorized as T-cell in general.…”
Section: Data Preprocessing and Ionbot Reprocessingmentioning
confidence: 99%
“…We obtained our proteomics dataset from The PRoteomics IDEntifications (PRIDE - https://www.ebi.ac.uk/pride/) database, the world’s largest data repository of mass spectrometry-based proteomics data (26). Specifically, we used 633 human proteomics project experiments with a total of 32,546 runs and reanalyzed them using ionbot (38) with an FDR threshold of 0.01 (39), resulting in a total of 154,885,151 peptide spectrum matches for 18,846 proteins. For the full list of projects, runs, and general statistics of the search see supplementary table (doi: 10.5281/zenodo.6798182).…”
Section: Methodsmentioning
confidence: 99%
“…Mass spectrometry data of the draft human proteome map developed by the Pandey group [55], composed of 30 histologically normal human samples including 17 adult tissues, 7 fetal tissues and 6 purified primary hematopoietic cells, were downloaded from PRIDE project PXD000561 and searched with ionbot version 0.8.0 [56]. Of the 30 samples, each was processed by several sample preparation methods and MS acquisition pipelines to generate 84 technical replicates.…”
Section: Tissue Expression Of Nt-proteoforms Evaluated Through Re-ana...mentioning
confidence: 99%
“…To further evaluate the expression of Nt-proteoforms in healthy human tissues, we re-analyzed public proteomics data of the draft map of the human proteome developed by the Pandey group [55]. The use of ionbot [56] and a custom-build protein sequence database (composed of UniProt and Ribo-seq derived proteoforms) led to 9,151,086 peptide to spectrum matches (PSMs). Further filtering and aggregation of the data was performed in R, leading to 8,501,009 filtered PSMs and 2,789,079 unique peptides belonging to 26,159 proteoforms.…”
Section: Selection Of N-terminal Proteoforms For Interaction Mappingmentioning
confidence: 99%