2022
DOI: 10.1101/2022.12.21.521274
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

CHESS 3: an improved, comprehensive catalog of human genes and transcripts based on large-scale expression data, phylogenetic analysis, and protein structure

Abstract: The original CHESS database of human genes was assembled from nearly 10,000 RNA sequencing experiments in 53 human body sites produced by the Genotype-Tissue Expression (GTEx) project, and then augmented with genes from other databases to yield a comprehensive collection of protein-coding and noncoding transcripts. The construction of the new CHESS 3 database employed improved transcript assembly algorithms, a new machine learning classifier, and protein structure predictions to identify genes and transcripts … Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
16
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
2

Relationship

2
3

Authors

Journals

citations
Cited by 12 publications
(16 citation statements)
references
References 40 publications
0
16
0
Order By: Relevance
“…And because RNA-seq datasets often produce large numbers of novel transcripts, the efficiency and scalability of ORFanage make it suitable for datasets of any size. We have recently applied our method to annotate ORFs in novel transcripts for the revised CHESS 3 4 catalog, and to help identify novel structurally stable isoforms that were then confirmed using AlphaFold2 56 .…”
Section: Discussionmentioning
confidence: 99%
See 4 more Smart Citations
“…And because RNA-seq datasets often produce large numbers of novel transcripts, the efficiency and scalability of ORFanage make it suitable for datasets of any size. We have recently applied our method to annotate ORFs in novel transcripts for the revised CHESS 3 4 catalog, and to help identify novel structurally stable isoforms that were then confirmed using AlphaFold2 56 .…”
Section: Discussionmentioning
confidence: 99%
“…This optimization technique does not require sequence alignment or pre-computed genome indices, greatly reducing the computational burden of running the tool and making the analysis far more efficient than an alignment-based approach. We have tested ORFanage on datasets comprising tens of millions of transcripts assembled from thousands of RNA-seq experiments 3, 4 and found that it runs robustly on these data.…”
Section: Methodsmentioning
confidence: 99%
See 3 more Smart Citations