Detection of Alternative Splice Variants at the Proteome Level in <i>Aspergillus flavus</i>

Chang, Kung-Yen; Georgianna, D. Ryan; Heber, Steffen; Payne, Gary A.; Muddiman, David C.

doi:10.1021/pr900602d

Cited by 28 publications

(30 citation statements)

References 49 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Another important issue in the discovery of alternative splice forms at the protein level is the low number of splicespecific peptides actually identified, an issue that has been revealed by work reported in the literature (17,19,22,24,25,30,31). Part of the reason for the low number of alternative splice variants detected are the technical differences between RNA-Seq and bottom-up proteomics, namely sequence coverage and detection sensitivity.…”

Section: Discussionmentioning

confidence: 99%

“…In this approach, exon coordinates are first determined by obtaining exon sequences from databases such as Ensembl or by using ab initio computational algorithms to predict the location of putative exon boundaries. Next, these exon sequences are assembled into all theoretical exon-exon (and exon-intron) combinations, and then the sequences are translated into polypeptide sequences and used for MS-based searching to discover novel splice variant peptides (27)(28)(29)(30). To extend this approach, several research groups have restricted their exon-exon database to include only those sequences corroborated with transcript expression data (31)(32)(33), thereby eliminating spurious sequences.…”

mentioning

confidence: 99%

See 1 more Smart Citation

Discovery and Mass Spectrometric Analysis of Novel Splice-junction Peptides Using RNA-Seq

Sheynkman

Shortreed

Frey

et al. 2013

Molecular & Cellular Proteomics

121

205

View full text Add to dashboard Cite

Human proteomic databases required for MS peptide identification are frequently updated and carefully curated, yet are still incomplete because it has been challenging to acquire every protein sequence from the diverse assemblage of proteoforms expressed in every tissue and cell type. In particular, alternative splicing has been shown to be a major source of this cell-specific proteomic variation. Many new alternative splice forms have been detected at the transcript level using next generation sequencing methods, especially RNA-Seq, but it is not known how many of these transcripts are being translated. Leveraging the unprecedented capabilities of next generation sequencing methods, we collected RNA-Seq and proteomics data from the same cell population (Jurkat cells) and created a bioinformatics pipeline that builds customized databases for the discovery of novel splicejunction peptides. Eighty million paired-end Illumina reads and ϳ500,000 tandem mass spectra were used to identify 12,873 transcripts (19,320 including isoforms) and 6810 proteins. We developed a bioinformatics workflow to retrieve high-confidence, novel splice junction sequences from the RNA data, translate these sequences into the analogous polypeptide sequence, and create a customized splice junction database for MS searching. Based on the RefSeq gene models, we detected 136,123 annotated and 144,818 unannotated transcript junctions. Of those, 24,834 unannotated junctions passed various quality filters (e.g. minimum read depth) and these entries were translated into 33,589 polypeptide sequences and used for database searching. We discovered 57 splice junction peptides not present in the Uniprot-Trembl proteomic database comprising an array of different splicing events, including skipped exons, alternative donors and acceptors, and noncanonical transcriptional start sites. To our knowledge this is the first example of using sample-specific RNA-Seq data to create a splice-junction database and discover new peptides resulting from alternative splicing. Mass spectrometry-based proteomics relies on accurate databases to identify and quantify proteins, including those derived from splice variants, indels, and single nucleotide variants (SNVs) 1 (1). Most computational search algorithms detect peptides by scoring the degree of similarity between in silico derived and experimental peptide spectra, and thus can only identify peptides that are present in the proteomic database. If the polypeptide sequence is not present in the database used for searching, even if the peptide is present in the sample, it will fail to be detected.Human proteomic databases used for mass spectrometric peptide identification are frequently updated and carefully curated, yet are still incomplete. Despite efforts to comprehensively annotate every gene product, there are still many undiscovered proteoforms (2) because the complete human proteome-the aggregate of all protein products expressed in every tissue, cell, and cellular state-turns out to be vastly more complex than was...

show abstract

Section: Discussionmentioning

confidence: 99%

mentioning

confidence: 99%

Discovery and Mass Spectrometric Analysis of Novel Splice-junction Peptides Using RNA-Seq

Sheynkman

Shortreed

Frey

et al. 2013

Molecular & Cellular Proteomics

121

205

View full text Add to dashboard Cite

show abstract

“…11 However, many experiments identify many fewer alternative isoforms than would be expected, even if the low peptide coverage is taken into account. 6,12–14 …”

Section: Introductionmentioning

confidence: 99%

Most Highly Expressed Protein-Coding Genes Have a Single Dominant Isoform

et al. 2015

View full text Add to dashboard Cite

Although eukaryotic cells express a wide range of alternatively spliced transcripts, it is not clear whether genes tend to express a range of transcripts simultaneously across cells, or produce dominant isoforms in a manner that is either tissue-specific or regardless of tissue. To date, large-scale investigations into the pattern of transcript expression across distinct tissues have produced contradictory results. Here, we attempt to determine whether genes express a dominant splice variant at the protein level. We interrogate peptides from eight large-scale human proteomics experiments and databases and find that there is a single dominant protein isoform, irrespective of tissue or cell type, for the vast majority of the protein-coding genes in these experiments, in partial agreement with the conclusions from the most recent large-scale RNAseq study. Remarkably, the dominant isoforms from the experimental proteomics analyses coincided overwhelmingly with the reference isoforms selected by two completely orthogonal sources, the consensus coding sequence variants, which are agreed upon by separate manual genome curation teams, and the principal isoforms from the APPRIS database, predicted automatically from the conservation of protein sequence, structure, and function.

show abstract

“…Many reasons have been suggested for the low proteome coverage including poor quality MS/MS spectra 8, 9 and incorrect genome annotation. 10, 11 There is also strong evidence that the primary reason for the low coverage is the DDA paradigm. 12, 13 DDA methods have been shown to limit the dynamic range of the analysis 14 leading to inadequate sampling of the proteome, even for relatively simple organisms.…”

Section: Introductionmentioning

confidence: 99%

Accurate Peptide Fragment Mass Analysis: Multiplexed Peptide Identification and Quantification

et al. 2012

View full text Add to dashboard Cite

FT All Reaction Monitoring (FT-ARM) is a novel approach for the identification and quantification of peptides that relies upon the selectivity of high mass accuracy data and the specificity of peptide fragmentation patterns. An FT-ARM experiment involves continuous, data-independent, high mass accuracy MS/MS acquisition spanning a defined m/z range. Custom software was developed to search peptides against the multiplexed fragmentation spectra by comparing theoretical or empirical fragment ions against every fragmentation spectrum across the entire acquisition. A dot product score is calculated against each spectrum in order to generate a score chromatogram used for both identification and quantification. Chromatographic elution profile characteristics are not used to cluster precursor peptide signals to their respective fragment ions. FT-ARM identifications are demonstrated to be complementary to conventional data-dependent shotgun analysis, especially in cases where the data-dependent method fails due to fragmenting multiple overlapping precursors. The sensitivity, robustness and specificity of FT-ARM quantification are shown to be analogous to selected reaction monitoring-based peptide quantification with the added benefit of minimal assay development. Thus, FT-ARM is demonstrated to be a novel and complementary data acquisition, identification, and quantification method for the large scale analysis of peptides.

show abstract

Detection of Alternative Splice Variants at the Proteome Level in Aspergillus flavus

Cited by 28 publications

References 49 publications

Discovery and Mass Spectrometric Analysis of Novel Splice-junction Peptides Using RNA-Seq

Discovery and Mass Spectrometric Analysis of Novel Splice-junction Peptides Using RNA-Seq

Most Highly Expressed Protein-Coding Genes Have a Single Dominant Isoform

Accurate Peptide Fragment Mass Analysis: Multiplexed Peptide Identification and Quantification

Contact Info

Product

Resources

About