Based on the expression patterns, genomes are viewed as a collection of protein-coding, RNA-coding, and non-expressing DNA sequences. Unlike most prokaryotes, eukaryotic gene expression comes with an additional step called alternative splicing. During the maturation process, different combinations of exons are spliced out and joined together resulting in the formation of mRNA isoforms. After removal from pre-mRNA, introns may be degraded by cellular exonucleases or form long non-coding RNAs (lncRNAs), or temporarily retained in the nucleus for regulating gene expression. We asked: Do introns have an unutilized potential for encoding proteins? If introns had an opportunity of getting translated, what kind of peptides or proteins, would they make? This study is based on the hypothesis of making functional proteins from leftover introns and is an extension of the original work of making functional proteins from the E. coli intergenic sequences (Dhar et al., 2009). Here full-length introns were computationally translated into proteins to study their potential structural, physicochemical, functional, and cellular location properties. Experimental validation is underway for a detailed understanding of the biology of intronic proteins. A synthetic intronic protein repository would provide an opportunity to design first-in-the-class molecules toward functional endpoints.
From a functional standpoint, a genome may be considered as a collection of three types of sequences: protein encoding, RNA encoding and non-expressing. Based on the previous sequencing and annotation work, it is now well accepted that a small proportion of the genome is allocated the job of encoding proteins, most of the genome encodes RNA while some DNA sequences are not used for expression. The exact ratio among these three types of sequences vary based on the organism. We asked: Is it possible to artificially encode protein and peptide sequences from naturally non expressing (dark genome) sequences? This led to proof of the concept of making functional proteins from the intergenic sequences of E.coli (Dhar et al 2009). This study is an extension of the original concept and has been organized around antisense DNA sequences. The full length antisense gene equivalents in forward and reverse orientations were computationally studied for their structural, cellular location and functional properties, leading to a number of interesting observations. The current study points to a huge untapped genomic space that needs to be examined from cell physiology, evolutionary and application perspectives.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.