2018
DOI: 10.1101/332825
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Thousands of large-scale RNA sequencing experiments yield a comprehensive new human gene list and reveal extensive transcriptional noise

Abstract: We assembled the sequences from 9,795 RNA sequencing experiments, collected from 31 human tissues and hundreds of subjects as part of the GTEx project, to create a new, comprehensive catalog of human genes and transcripts. The new human gene database contains 43,162 genes, of which 21,306 are protein-coding and 21,856 are noncoding, and a total of 323,824 transcripts, for an average of 7.5 transcripts per gene. Our expanded gene list includes 4,998 novel genes (1,178 coding and 3,819 noncoding) and 97,511 nove… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

2
30
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 40 publications
(32 citation statements)
references
References 54 publications
2
30
0
Order By: Relevance
“…We discovered 8.4 to 22Mb of novel transcription across all tissues, consistent with previous reports that annotation remains incomplete 26,27 . Novel ERs predominantly fell into intragenic regions suggesting we were improving the annotation of known genes rather than discovering new genes (Figure 2a).…”
Section: Novel Transcription Is Widespread Across All Human Tissues Asupporting
confidence: 90%
“…We discovered 8.4 to 22Mb of novel transcription across all tissues, consistent with previous reports that annotation remains incomplete 26,27 . Novel ERs predominantly fell into intragenic regions suggesting we were improving the annotation of known genes rather than discovering new genes (Figure 2a).…”
Section: Novel Transcription Is Widespread Across All Human Tissues Asupporting
confidence: 90%
“…The transcription of unannotated genome sequence is evidenced by transcriptomic studies in which raw reads do not align to known genes (Wu and Knudson, 2018;Pertea et al, 2018b;Delcourt et al, 2018;Struhl, 2007;Lu et al, 2017). Many dismiss this genome expression as "noise" (Pertea et al, 2018b,a;Consortium et al, 2012;Barroso et al, 2018;Lloréns-Rico et al, 2016), However, some of this sequence are translated (Wu and Knudson, 2018;Carvunis et al, 2012;Smith et al, 2014;Olexiouk et al, 2017;Hsu et al, 2016;Ruiz-Orera et al, 2015;Chew et al, 2013), and several functional genes have been identified from it (Ji et al, 2015;Andrews and Rothnagel, 2014).…”
Section: Introductionmentioning
confidence: 99%
“…Over 500 million years of evolution from hydra to humans, the total number of ORFs have been thought to remain the same at around 30,000. With the advent of deep sequencing strategies in both genomics and proteomics fields, we are now discovering nORFs that have remained undiscovered or 'hidden' 1,2,5,6 . These nORFs are pervasive throughout the genome and are observed in both the coding and non coding regions 1,7 .…”
mentioning
confidence: 99%