2015
DOI: 10.1038/srep10940
|View full text |Cite
|
Sign up to set email alerts
|

Revealing Missing Human Protein Isoforms Based on Ab Initio Prediction, RNA-seq and Proteomics

Abstract: Biological and biomedical research relies on comprehensive understanding of protein-coding transcripts. However, the total number of human proteins is still unknown due to the prevalence of alternative splicing. In this paper, we detected 31,566 novel transcripts with coding potential by filtering our ab initio predictions with 50 RNA-seq datasets from diverse tissues/cell lines. PCR followed by MiSeq sequencing showed that at least 84.1% of these predicted novel splice sites could be validated. In contrast to… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
22
0

Year Published

2016
2016
2021
2021

Publication Types

Select...
3
3
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 54 publications
(22 citation statements)
references
References 67 publications
0
22
0
Order By: Relevance
“…In one project, ab initio gene prediction was used to generate a set of 8 million candidate transcripts. Subsequent filtering and validation by 26 RNA‐seq data sets and shotgun proteomics revealed 36 novel proteins and more than 31 000 new transcripts . For generating the intropolis resource, this approach was taken even further .…”
Section: Is There Consensus On the Low‐hanging Fruits?mentioning
confidence: 99%
“…In one project, ab initio gene prediction was used to generate a set of 8 million candidate transcripts. Subsequent filtering and validation by 26 RNA‐seq data sets and shotgun proteomics revealed 36 novel proteins and more than 31 000 new transcripts . For generating the intropolis resource, this approach was taken even further .…”
Section: Is There Consensus On the Low‐hanging Fruits?mentioning
confidence: 99%
“…Alternative splicing is common, some 95% of multiexon genes could undergo alternative splicing [119][120][121], but it is unclear how many forms are biologically relevant as many of them are extremely rare, restrained to a few cell types and may thus not be the explanation for the majority of complexity of proteome [122]. Combined RNA sequencing and proteomics data along with bioinformatic predictions indicated 72% of human genes to have alternative splice forms that could be translated to proteins [123]. Analysis of functionally distinct splice forms in over 700 human and mouse genes, biased towards literature notions of alternative splicing, indicated that just a small fraction of the transcripts was functionally distinct [124].…”
Section: Protein-coding Rnamentioning
confidence: 99%
“…Most often, transcriptomics studies focus on the expression of protein-coding genes. However, the human transcriptome also includes non-coding RNA, and may contain up to 350,000 different transcripts [ 94 ]. Gene expression data from published transcriptomics studies are generally deposited in the public data repositories Gene Expression Omnibus ( http://www.ncbi.nlm.nih.gov/geo/ ) or Array Express ( https://www.ebi.ac.uk/arrayexpress/ ).…”
Section: Box 1 High-throughput Technologies To Profile the Host Respomentioning
confidence: 99%