Search citation statements
Paper Sections
Citation Types
Year Published
Publication Types
Relationship
Authors
Journals
Several recent studies have presented evidence that the human gene catalogue should be expanded to include thousands of short open reading frames (ORFs) appearing upstream or downstream of existing protein-coding genes, each of which would comprise an additional bicistronic transcript in humans. Here we explore an alternative hypothesis that would explain the translational and evolutionary evidence for these upstream ORFs without the need to create novel genes or bicistronic transcripts. We examined 2,199 upstream ORFs that have been proposed as high-quality candidates for novel genes, to determine if they could instead represent protein-coding exons that can be added to existing genes. We checked for the conservation of these ORFs in four recently sequenced, high-quality human genomes, and found a large majority (87.8%) to be conserved in all four as expected. We then looked for splicing evidence that would connect each upstream ORF to the downstream protein-coding gene at the same locus, thus creating a novel splicing variant using the upstream ORF as its first exon. These protein coding exon candidates were further evaluated using protein structure predictions of the protein sequences that included the proposed new exons. We determined that 582 out of 2,199 upstream ORFs have strong evidence that they can form protein coding exons that are part of an existing gene, and that the resulting protein is predicted to have similar or better structural quality than the currently annotated isoform.
Several recent studies have presented evidence that the human gene catalogue should be expanded to include thousands of short open reading frames (ORFs) appearing upstream or downstream of existing protein-coding genes, each of which would comprise an additional bicistronic transcript in humans. Here we explore an alternative hypothesis that would explain the translational and evolutionary evidence for these upstream ORFs without the need to create novel genes or bicistronic transcripts. We examined 2,199 upstream ORFs that have been proposed as high-quality candidates for novel genes, to determine if they could instead represent protein-coding exons that can be added to existing genes. We checked for the conservation of these ORFs in four recently sequenced, high-quality human genomes, and found a large majority (87.8%) to be conserved in all four as expected. We then looked for splicing evidence that would connect each upstream ORF to the downstream protein-coding gene at the same locus, thus creating a novel splicing variant using the upstream ORF as its first exon. These protein coding exon candidates were further evaluated using protein structure predictions of the protein sequences that included the proposed new exons. We determined that 582 out of 2,199 upstream ORFs have strong evidence that they can form protein coding exons that are part of an existing gene, and that the resulting protein is predicted to have similar or better structural quality than the currently annotated isoform.
The high complexity of eukaryotic organisms enabled their evolutionary success, which became possible due to the diversification of eukaryotic proteomes. Various mechanisms contributed to this process. Alternative splicing had the largest known impact among these mechanisms: tens or hundreds of protein isoforms produced from a single genetic locus. Earlier, we hypothesized that along with alternative splicing, a different but conceptually similar mechanism creates novel versions of existing proteins in all eukaryotes. However, this mechanism acts at the level of translation, where the novelty of an amino acid sequence is achieved via multiple programmed ribosomal frameshifting. This mechanism, which is termed mosaic translation, is very difficult to demonstrate even with the most up-to-date molecular tools. Thus, it remained unnoticed so far. Using only a portion of all mass spectrometry proteomic data generated from various organs of the model plantMedicago truncatula, we attempted the first step toward the experimental proof of this hypothesis. Our originalin silicoapproach resulted in the discovery of two candidates for mosaic proteins (homologs of EF1α and RuBisCo) and 154 candidates for chimeric peptides. Chimeric peptides and polypeptides are produced in the course of one ribosomal frameshifting event and may correspond to parts of mosaic proteins. In addition, our analysis reveals the possibility of translation of chimeric peptides from five ribosomal RNA transcripts, ten long non-coding RNA transcripts, and one transfer RNA transcript. These findings are very novel and will be the basis for experimental validation in future studies. In this work, we present multiple lines of indirect evidence that support the validity of ourin silicodata.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.