As a base for human transcriptome and functional genomics, we created the "full-length long Japan" (FLJ) collection of sequenced human cDNAs. We determined the entire sequence of 21,243 selected clones and found that 14,490 cDNAs (10,897 clusters) were unique to the FLJ collection. About half of them (5,416) seemed to be protein-coding. Of those, 1,999 clusters had not been predicted by computational methods. The distribution of GC content of nonpredicted cDNAs had a peak at ∼58% compared with a peak at ∼42%for predicted cDNAs. Thus, there seems to be a slight bias against GC-rich transcripts in current gene prediction procedures. The rest of the cDNAs unique to the FLJ collection (5,481) contained no obvious open reading frames (ORFs) and thus are candidate noncoding RNAs. About one-fourth of them (1,378) showed a clear pattern of splicing. The distribution of GC content of noncoding cDNAs was narrow and had a peak at ∼42%, relatively low compared with that of protein-coding cDNAs.
Mammalian genomes produce huge numbers of noncoding RNAs (ncRNAs). However, the functions of most ncRNAs are unclear, and novel techniques that can distinguish functional ncRNAs are needed. Studies of mRNAs have revealed that the half-life of each mRNA is closely related to its physiological function, raising the possibility that the RNA stability of an ncRNA reflects its function. In this study, we first determined the half-lives of 11,052 mRNAs and 1418 ncRNAs in HeLa Tet-off (TO) cells by developing a novel genome-wide method, which we named 59-bromo-uridine immunoprecipitation chase-deep sequencing analysis (BRIC-seq). This method involved pulse-labeling endogenous RNAs with 59-bromo-uridine and measuring the ongoing decrease in RNA levels over time using multifaceted deep sequencing. By analyzing the relationship between RNA half-lives and functional categories, we found that RNAs with a long half-life (t 1/2 $ 4 h) contained a significant proportion of ncRNAs, as well as mRNAs involved in housekeeping functions, whereas RNAs with a short halflife (t 1/2 < 4 h) included known regulatory ncRNAs and regulatory mRNAs. The stabilities of a significant set of short-lived ncRNAs are regulated by external stimuli, such as retinoic acid treatment. In particular, we identified and characterized several novel long ncRNAs involved in cell proliferation from the group of short-lived ncRNAs. We designated this novel class of ncRNAs with a short half-life as Short-Lived noncoding Transcripts (SLiTs). We propose that the strategy of monitoring RNA half-life will provide a powerful tool for investigating hitherto functionally uncharacterized regulatory RNAs.
By analyzing 1,780,295 5Ј-end sequences of human full-length cDNAs derived from 164 kinds of oligo-cap cDNA libraries, we identified 269,774 independent positions of transcriptional start sites (TSSs) for 14,628 human RefSeq genes. These TSSs were clustered into 30,964 clusters that were separated from each other by more than 500 bp and thus are very likely to constitute mutually distinct alternative promoters. To our surprise, at least 7674 (52%) human RefSeq genes were subject to regulation by putative alternative promoters (PAPs). On average, there were 3.1 PAPs per gene, with the composition of one CpG-island-containing promoter per 2.6 CpG-less promoters. In 17% of the PAP-containing loci, tissue-specific use of the PAPs was observed. The richest tissue sources of the tissue-specific PAPs were testis and brain. It was also intriguing that the PAP-containing promoters were enriched in the genes encoding signal transduction-related proteins and were rarer in the genes encoding extracellular proteins, possibly reflecting the varied functional requirement for and the restricted expression of those categories of genes, respectively. The patterns of the first exons were highly diverse as well. On average, there were 7.7 different splicing types of first exons per locus partly produced by the PAPs, suggesting that a wide variety of transcripts can be achieved by this mechanism. Our findings suggest that use of alternate promoters and consequent alternative use of first exons should play a pivotal role in generating the complexity required for the highly elaborated molecular systems in humans.[Supplemental material is available online at www.genome.org. The sequence data from this study have been submitted to DDBJ under accession nos. DA000001-DA999999, DB000001-DB294747, DB294748-DB384947, BP192706-BP383670, AU279383-AU280837, and AU116788-U160826.]One of the most striking findings revealed by the Human Genome Project is that the human genome contains only 20,000-25,000 kinds of protein-coding genes (International Human Genome Sequencing Consortium 2004). This number is unexpectedly small compared with the total gene numbers in yeast, fly, and worm genomes, which are estimated to be 6,000, 14,000, and 19,000, respectively (Goffeau et al. 1996;C. elegans Sequencing Consortium 1998;Adams et al. 2000). It is supposed that there must be other factors in addition to mere gene numbers to satisfy the prerequisites that enable the human genome to fabricate such highly elaborated systems as the brain and immune systems. To explain this, it has been hypothesized that multifaceted use of the genes should play a pivotal role in functional
Appropriate resources and expression technology necessary for human proteomics on a whole-proteome scale are being developed. We prepared a foundation for simple and efficient production of human proteins using the versatile Gateway vector system. We generated 33,275 human Gateway entry clones for protein synthesis, developed mRNA expression protocols for them and improved the wheat germ cell-free protein synthesis system. We applied this protein expression system to the in vitro expression of 13,364 human proteins and assessed their biological activity in two functional categories. Of the 75 tested phosphatases, 58 (77%) showed biological activity. Several cytokines containing disulfide bonds were produced in an active form in a nonreducing wheat germ cell-free expression system. We also manufactured protein microarrays by direct printing of unpurified in vitro-synthesized proteins and demonstrated their utility. Our 'human protein factory' infrastructure includes the resources and expression technology for in vitro proteome research.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.