Interest in the use of machine learning for peptide fragmentation spectrum prediction has been strongly on the rise over the past years, especially for applications in challenging proteomics identification workflows such as immunopeptidomics and the full-proteome identification of data independent acquisition spectra. Since its inception, the MS²PIP peptide spectrum predictor has been widely used for various downstream applications, mostly thanks to its accuracy, ease-of-use, and broad applicability. We here present a thoroughly updated version of the MS²PIP web server, which includes new and more performant prediction models for both tryptic- and non-tryptic peptides, for immunopeptides, and for CID-fragmented TMT-labeled peptides. Additionally, we have also added new functionality to greatly facilitate the generation of proteome-wide predicted spectral libraries, requiring only a FASTA protein file as input. These libraries also include retention time predictions from DeepLC. Moreover, we now provide pre-built and ready-to-download spectral libraries for various model organisms in multiple DIA-compatible spectral library formats. Besides upgrading the back-end models, the user experience on the MS²PIP web server is thus also greatly enhanced, extending its applicability to new domains, including immunopeptidomics and MS3-based TMT quantification experiments. MS²PIP is freely available at https://iomics.ugent.be/ms2pip/.
Immunopeptidomics aims to identify immunopeptides, which are presented on Major Histocompatibility Complexes (MHC) on every cell and can be used to develop vaccines against pathogens and cancer. However, existing immunopeptidomics data analysis pipelines have some major hurdles to overcome, mostly due to the non-tryptic nature of immunopeptides, which complicates their identification. Previously, the machine and deep learning tools MS2PIP and DeepLC have shown to improve tryptic peptide identifications by accurately predicting tandem mass spectrometry (MS2) peak intensities and retention times, respectively, and by using these predictions to rescore peptide-spectrum matches (PSMs) with post-processing tool Percolator. However, MS2PIP was still tailored towards tryptic peptides and fragmentation patterns of immunopeptides are drastically different. To enable MS2PIP-based rescoring of immunopeptide PSMs, we have retrained MS2PIP to include non-tryptic peptides. These newly trained MS2PIP models greatly improve the predictions for immunopeptides and, surprisingly, also for tryptic peptides. Next, the new MS2PIP models, DeepLC, and Percolator were integrated into one software package, called MS2Rescore. Using MS2Rescore, 46% more spectra and 36% more unique peptides were identified at 1% false discovery rate (FDR), with even more extreme differences at 0.1% FDR, in comparison with standard Percolator rescoring. Due to the innovative extraction of MS2PIP-, DeepLC and search engine-based features, MS2Rescore even outperforms current state-of-the-art immunopeptide rescoring efforts. Thus, the integration of the new immunopeptide MS2PIP models, DeepLC, and Percolator into MS2Rescore shows great promise to substantially improve the identification of novel epitopes from immunopeptidomics workflows.
A plethora of proteomics search engine output file formats are in circulation. This lack of standardized output files greatly complicates generic downstream processing of peptide-spectrum matches (PSMs) and PSM files. While standards exist to solve this problem, these are far from universally supported by search engines. Moreover, software libraries are available to read a selection of PSM file formats, but a package to parse PSM files into a unified data structure has been missing. Here, we present psm_utils, a Python package to read and write various PSM file formats and to handle peptidoforms, PSMs, and PSM lists in a unified and user-friendly Python-, command line-, and web-interface. psm_utils was developed with pragmatism and maintainability in mind, adhering to community standards and relying on existing packages where possible. The Python API and command line interface greatly facilitate handling various PSM file formats. Moreover, a user-friendly web application was built using psm_utils that allows anyone to interconvert PSM files and retrieve basic PSM statistics. psm_utils is freely available under the permissive Apache2 license at .
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.