2017
DOI: 10.12688/f1000research.11119.1
|View full text |Cite
|
Sign up to set email alerts
|

Last rolls of the yoyo: Assessing the human canonical protein count

Abstract: In 2004, when the protein estimate from the finished human genome was only 24,000, the surprise was compounded as reviewed estimates fell to 19,000 by 2014. However, variability in the total canonical protein counts (i.e. excluding alternative splice forms) of open reading frames (ORFs) in different annotation portals persists. This work assesses these differences and possible causes. A 16-year analysis of Ensembl and UniProtKB/Swiss-Prot shows convergence to a protein number of ~20,000. The former had shown s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
8
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 12 publications
(8 citation statements)
references
References 34 publications
0
8
0
Order By: Relevance
“…As a result, the foreseen increase in smORF count in Swiss-Prot falls short, with an increase from 3.1% in 2009 to 3.3% in 2017 (Southan 2017). This means that despite the large number of smORF and alternative ORF discoveries, only a limited number make it through to genome annotation (Southan 2017). The current genome annotation system has been blamed for simplifying a transcript's definition, not taking into account their potential to hold multiple functional features (for review, see Mudge and Harrow 2016).…”
Section: Proposition Of a Novel Annotation Frameworkmentioning
confidence: 99%
See 2 more Smart Citations
“…As a result, the foreseen increase in smORF count in Swiss-Prot falls short, with an increase from 3.1% in 2009 to 3.3% in 2017 (Southan 2017). This means that despite the large number of smORF and alternative ORF discoveries, only a limited number make it through to genome annotation (Southan 2017). The current genome annotation system has been blamed for simplifying a transcript's definition, not taking into account their potential to hold multiple functional features (for review, see Mudge and Harrow 2016).…”
Section: Proposition Of a Novel Annotation Frameworkmentioning
confidence: 99%
“…ORF-prediction algorithms apply the criteria of a single CDS per transcript, and a minimum length of 100 codons, unless the sequence bears high similarity to known proteins or domains (Furuno et al 2003;Pruitt et al 2012;Aken et al 2016). As a result, the foreseen increase in smORF count in Swiss-Prot falls short, with an increase from 3.1% in 2009 to 3.3% in 2017 (Southan 2017). This means that despite the large number of smORF and alternative ORF discoveries, only a limited number make it through to genome annotation (Southan 2017).…”
Section: Proposition Of a Novel Annotation Frameworkmentioning
confidence: 99%
See 1 more Smart Citation
“…The number of lncRNA genes in the human genome has been estimated at 20,000 to 100,000 (Zhao et al, 2016;Fang et al, 2018;Uszczynska-Ratajczak et al, 2018). This number is greater than the canonical protein-coding genes in the human genome (Southan, 2017). lncRNAs are primarily retained in the nucleus, having short half-lives and a rapid turnover rate compared to mRNAs (Clark et al, 2012;Derrien et al, 2012;Yoon et al, 2015).…”
Section: Introductionmentioning
confidence: 99%
“…(42), their gene annotation is still requiring improvement, especially for non-model species. Also, the gene age inferred by GenOrigin is based on the orthology information of Ensembl Compara(14,15), which perform great for old genes but not for new genes.…”
mentioning
confidence: 99%