2018
DOI: 10.1186/s13059-018-1590-2
|View full text |Cite
|
Sign up to set email alerts
|

CHESS: a new human gene catalog curated from thousands of large-scale RNA sequencing experiments reveals extensive transcriptional noise

Abstract: We assembled the sequences from deep RNA sequencing experiments by the Genotype-Tissue Expression (GTEx) project, to create a new catalog of human genes and transcripts, called CHESS. The new database contains 42,611 genes, of which 20,352 are potentially protein-coding and 22,259 are noncoding, and a total of 323,258 transcripts. These include 224 novel protein-coding genes and 116,156 novel transcripts. We detected over 30 million additional transcripts at more than 650,000 genomic loci, nearly all of which … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
282
1
4

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 296 publications
(290 citation statements)
references
References 57 publications
3
282
1
4
Order By: Relevance
“…Additionally, about 5,700-5,800 genes in human and 2,600-3,900 in mouse produce more than one protein in those conditions. These estimates are considerably lower than what is generally predicted from RNA expression (Pertea et al 2018). This may be explained by the limited coverage of Ribo-seq reads but may be also due to the fact that RNA-seq artificially amplifies fragments of unproductive RNAs leading to many false positives.…”
Section: Discussionmentioning
confidence: 61%
See 1 more Smart Citation
“…Additionally, about 5,700-5,800 genes in human and 2,600-3,900 in mouse produce more than one protein in those conditions. These estimates are considerably lower than what is generally predicted from RNA expression (Pertea et al 2018). This may be explained by the limited coverage of Ribo-seq reads but may be also due to the fact that RNA-seq artificially amplifies fragments of unproductive RNAs leading to many false positives.…”
Section: Discussionmentioning
confidence: 61%
“…Differential production of transcript isoforms, especially through the mechanism of alternative splicing, is crucial in multiple biological processes such as cell differentiation, acquisition of tissuespecific functions, and DNA repair (Fiszbein and Kornblihtt 2017;Baralle and Giudice 2017;Shkreta and Chabot 2015), as well as in multiple pathologies (Ward and Cooper 2010;Singh and Eyras 2017;Cummings et al 2017). Although analysis of RNA sequencing (RNA-seq) data from multiple samples has indicated a large diversity of transcript molecules (Pertea et al 2018), genes express mostly one single isoform in any given condition and this isoform may change across conditions (Gonzàlez-Porta et al 2013;Sebestyén et al 2015).…”
Section: Introductionmentioning
confidence: 99%
“…To predict whether ERV-ORFs are expressed as proteins, we performed comparative analyses using a wide-range of transcriptome and histone mark data generated by next-generation sequencing. We first compared observed ERV-ORFs from our study with quantified transcriptome data derived from 31 human tissues generated in the GTEx study (Carithers et al 2015) from the CHESS database (Pertea et al 2018). We found that a total of 279 ERV-ORFs overlapped with transcripts in the CHESS database.…”
Section: Omics Analyses Of Erv-orfsmentioning
confidence: 99%
“…One of the most comprehensive RNA-Seq datasets was generated by the genotype-tissue expression (GTEx) study using 31 human tissues (Carithers et al 2015). The data was used to assemble a similar pipeline in the Comprehensive Human Expressed SequenceS (CHESS) project and served to generate 20,352 potential protein-coding genes, and 116,156 novel transcripts in the human genome (Pertea et al 2018). A comparative analysis study focusing on the protein-coding region of ERVs using the new transcript data, in combination with cap analysis of gene expression (CAGE) data (Lizio et al 2015), and epigenomic data will provide new insight into the transcriptional potential, and functionality of these regions.…”
Section: Introductionmentioning
confidence: 99%
“…ClusterBurden was devised to be suitable for scanning large-scale whole-exome sequencing projects designed to identify novel pathogenic genes for rare Mendelian diseases. The combination of FE and BIN-test to model clustering and burden minimizes the computation overhead required to calculate p-values making analyses of >20,000 genes [40] in hundreds of thousands of cases and controls, practical in terms of execution time and computer memory requirements. For genes with a significant BIN-test 'cluster' p-value, ClusterBurden calculated considerably lower p-values that the traditional Fisher's exact 'burden' test, implying an increase in statistical power (Table 1).…”
Section: Clusterburdenmentioning
confidence: 99%