RNA sequencing has led to the discovery of many transcript isoforms created by alternative splicing, but the translational status and functional significance of most alternative splicing events remain unknown.Here we applied a splice junction-centric approach to survey the landscape of protein alternative isoform expression in the human proteome. We focused on alternative splice events where pairs of splice junctions corresponding to included and excluded exons with appreciable read counts are translated together into selective protein sequence databases. Using this approach, we constructed tissue-specific FASTA databases from ENCODE RNA sequencing data, then reanalyzed splice junction peptides in existing mass spectrometry datasets across 10 human tissues (heart, lung, liver, pancreas, ovary, testis, colon, prostate, adrenal gland, and esophagus). Our analysis reidentified 1,108 non-canonical isoforms annotated in SwissProt. We further found 253 novel splice junction peptides in 212 genes that are not documented in the comprehensive Uniprot TrEMBL or Ensembl RefSeq databases. On a proteome scale, non-canonical isoforms differ from canonical sequences preferentially at sequences with heightened protein disorder, suggesting a functional consequence of alternative splicing on the proteome is the regulation of intrinsically disordered regions. We further observed examples where isoform-specific regions intersect with important cardiac protein phosphorylation sites. Our results reveal previously unidentified protein isoforms and may avail efforts to elucidate the functions of splicing events and expand the pool of observable biomarkers in profiling studies.
Acronyms and AbbreviationsA3SS; alternative 3-prime splice site; A5SS; alternative 5-prime splice site; FDR: false discovery rate; IDR:intrinsically disordered regions; MXE; mutually exclusive exons; PSI: percent spliced in; PTC: premature termination codon; PTM: post-translational modifications; SE: skipped exon; RI: retained intron.