Transcriptome and ribosome sequencing have revealed the existence of many non-canonical transcripts, mainly containing splice variants, ncRNA, sORFs and altORFs. However, identification and characterization of products that may be translated out of these remains a challenge. Addressing this, we here report on 552 non-canonical proteins and splice variants in the model organism C. elegans using tandem mass spectrometry. Aided by sequencing-based prediction, we generated a custom proteome database tailored to search for non-canonical translation products of C. elegans. Using this database, we mined available mass spectrometric resources of C. elegans, from which 51 novel, non-canonical proteins could be identified. Furthermore, we utilized diverse proteomic and peptidomic strategies to detect 40 novel non-canonical proteins in C. elegans by LC-TIMS-MS/MS, of which 6 were common with our meta-analysis of existing resources. Together, this permits us to provide a resource with detailed annotation of 467 splice variants and 85 novel proteins mapped onto UTRs, non-coding regions and alternative open reading frames of the C. elegans genome.
With the rapid growth in the number of sequenced genomes, genome annotation efforts became almost exclusively reliant on automated pipelines. Despite their unquestionable utility, these methods have been shown to underestimate the true complexity of the studied genomes, with small open reading frames (sORFs; ORFs typically considered shorter than 300 nucleotides) and, in consequence, their protein products (sORF encoded polypeptides or SEPs) being the primary example of a poorly annotated and highly underexplored class of genomic elements. With the advent of advanced translatomics such as ribosome profiling, reannotation efforts have progressed a great deal in providing translation evidence for numerous, previously unannotated sORFs. However, proteomics validation of these riboproteogenomics discoveries remains challenging due to their short length and often highly variable physiochemical properties. In this work we evaluate and compare tailored, yet easily adaptable, protein extraction methodologies for their efficacy in the extraction and concomitantly proteomics detection of SEPs expressed in the prokaryotic model pathogen Salmonella typhimurium (S. typhimurium). Further, an optimized protocol for the enrichment and efficient detection of SEPs making use of the of amphipathic polymer amphipol A8-35 and relying on differential peptide vs. protein solubility was developed and compared with global extraction methods making use of chaotropic agents. Given the versatile biological functions SEPs have been shown to exert, this work provides an accessible protocol for proteomics exploration of this fascinating class of small proteins.
Bioactive peptides exhibit key roles in a wide variety of complex processes, such as regulation of body weight, learning, aging, and innate immune response. Next to the classical bioactive peptides, emerging from larger precursor proteins by specific proteolytic processing, a new class of peptides originating from small open reading frames (sORFs) have been recognized as important biological regulators. But their intrinsic properties, specific expression pattern and location on presumed non-coding regions have hindered the full characterization of the repertoire of bioactive peptides, despite their predominant role in various pathways. Although the development of peptidomics has offered the opportunity to study these peptides in vivo, it remains challenging to identify the full peptidome as the lack of cleavage enzyme specification and large search space complicates conventional database search approaches. In this study, we introduce a proteogenomics methodology using a new type of mass spectrometry instrument and the implementation of machine learning tools toward improved identification of potential bioactive peptides in the mouse brain. The application of trapped ion mobility spectrometry (tims) coupled to a time-of-flight mass analyzer (TOF) offers improved sensitivity, an enhanced peptide coverage, reduction in chemical noise and the reduced occurrence of chimeric spectra. Subsequent machine learning tools MS2PIP, predicting fragment ion intensities and DeepLC, predicting retention times, improve the database searching based on a large and comprehensive custom database containing both sORFs and alternative ORFs. Finally, the identification of peptides is further enhanced by applying the post-processing semi-supervised learning tool Percolator. Applying this workflow, the first peptidomics workflow combined with spectral intensity and retention time predictions, we identified a total of 167 predicted sORF-encoded peptides, of which 48 originating from presumed non-coding locations, next to 401 peptides from known neuropeptide precursors, linked to 66 annotated bioactive neuropeptides from within 22 different families. Additional PEAKS analysis expanded the pool of SEPs on presumed non-coding locations to 84, while an additional 204 peptides completed the list of peptides from neuropeptide precursors. Altogether, this study provides insights into a new robust pipeline that fuses technological advancements from different fields ensuring an improved coverage of the neuropeptidome in the mouse brain.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.