By analyzing 1,780,295 5Ј-end sequences of human full-length cDNAs derived from 164 kinds of oligo-cap cDNA libraries, we identified 269,774 independent positions of transcriptional start sites (TSSs) for 14,628 human RefSeq genes. These TSSs were clustered into 30,964 clusters that were separated from each other by more than 500 bp and thus are very likely to constitute mutually distinct alternative promoters. To our surprise, at least 7674 (52%) human RefSeq genes were subject to regulation by putative alternative promoters (PAPs). On average, there were 3.1 PAPs per gene, with the composition of one CpG-island-containing promoter per 2.6 CpG-less promoters. In 17% of the PAP-containing loci, tissue-specific use of the PAPs was observed. The richest tissue sources of the tissue-specific PAPs were testis and brain. It was also intriguing that the PAP-containing promoters were enriched in the genes encoding signal transduction-related proteins and were rarer in the genes encoding extracellular proteins, possibly reflecting the varied functional requirement for and the restricted expression of those categories of genes, respectively. The patterns of the first exons were highly diverse as well. On average, there were 7.7 different splicing types of first exons per locus partly produced by the PAPs, suggesting that a wide variety of transcripts can be achieved by this mechanism. Our findings suggest that use of alternate promoters and consequent alternative use of first exons should play a pivotal role in generating the complexity required for the highly elaborated molecular systems in humans.[Supplemental material is available online at www.genome.org. The sequence data from this study have been submitted to DDBJ under accession nos. DA000001-DA999999, DB000001-DB294747, DB294748-DB384947, BP192706-BP383670, AU279383-AU280837, and AU116788-U160826.]One of the most striking findings revealed by the Human Genome Project is that the human genome contains only 20,000-25,000 kinds of protein-coding genes (International Human Genome Sequencing Consortium 2004). This number is unexpectedly small compared with the total gene numbers in yeast, fly, and worm genomes, which are estimated to be 6,000, 14,000, and 19,000, respectively (Goffeau et al. 1996;C. elegans Sequencing Consortium 1998;Adams et al. 2000). It is supposed that there must be other factors in addition to mere gene numbers to satisfy the prerequisites that enable the human genome to fabricate such highly elaborated systems as the brain and immune systems. To explain this, it has been hypothesized that multifaceted use of the genes should play a pivotal role in functional
Here we conducted an integrative multi-omics analysis to understand how cancers harbor various types of aberrations at the genomic, epigenomic and transcriptional levels. In order to elucidate biological relevance of the aberrations and their mutual relations, we performed whole-genome sequencing, RNA-Seq, bisulfite sequencing and ChIP-Seq of 26 lung adenocarcinoma cell lines. The collected multi-omics data allowed us to associate an average of 536 coding mutations and 13,573 mutations in promoter or enhancer regions with aberrant transcriptional regulations. We detected the 385 splice site mutations and 552 chromosomal rearrangements, representative cases of which were validated to cause aberrant transcripts. Averages of 61, 217, 3687 and 3112 mutations are located in the regulatory regions which showed differential DNA methylation, H3K4me3, H3K4me1 and H3K27ac marks, respectively. We detected distinct patterns of aberrations in transcriptional regulations depending on genes. We found that the irregular histone marks were characteristic to EGFR and CDKN1A, while a large genomic deletion and hyper-DNA methylation were most frequent for CDKN2A. We also used the multi-omics data to classify the cell lines regarding their hallmarks of carcinogenesis. Our datasets should provide a valuable foundation for biological interpretations of interlaced genomic and epigenomic aberrations.
Combining our full-length cDNA method and the massively parallel sequencing technology, we developed a simple method to collect precise positional information of transcriptional start sites (TSSs) together with digital information of the gene-expression levels in a high throughput manner. We applied this method to observe gene-expression changes in a colon cancer cell line cultured in normoxic and hypoxic conditions. We generated more than 100 million 36-base TSS-tag sequences and revealed comprehensive features of hypoxia responsive alterations in the transcriptional landscape of the human genome. The features include presence of inducible ‘hot regions’ in 54 genomic regions, 220 novel hypoxia inducible promoters that may drive non-protein-coding transcripts, 191 hypoxia responsive alternative promoters and detailed views of 120 novel as well as known hypoxia responsive genes. We further analyzed hypoxic response of different cells using additional 60 million TSS-tags and found that the degree of the gene-expression changes were different among cell lines, possibly reflecting cellular robustness against hypoxia. The novel dynamic figure of the human gene transcriptome will deepen our understanding of the transcriptional program of the human genome as well as bringing new insights into the biology of cancer cells in hypoxia.
Although the knowledge accumulated on the transcriptional regulations of eukaryotes is significant, the knowledge on their translational regulations remains limited. Thus, we performed a comprehensive detection of terminal oligo-pyrimidine (TOP), which is one of the well-characterized cis-regulatory motifs for translational controls located immediately downstream of the transcriptional start sites of mRNAs. Utilizing our precise 5′-end information of the full-length cDNAs, we could screen 1645 candidate TOP genes by position specific matrix search. Among them, not only 75 out of 78 ribosomal protein genes but also eight previously identified non-ribosomal-protein TOP genes were included. We further experimentally validated the translational activities of 83 TOP candidate genes. Clear translational regulations exerted on the stimulation of 12-O-tetradecanoyl-1-phorbol-13-acetate for at least 41 of them was observed, indicating that there should be a few hundreds of human genes which are subjected to regulation at translation levels via TOPs. Our result suggests that TOP genes code not only formerly characterized ribosomal proteins and translation-related proteins but also a wider variety of proteins, such as lysosome-related proteins and metabolism-related proteins, playing pivotal roles in gene expression controls in the majority of cellular mRNAs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.