As a base for human transcriptome and functional genomics, we created the "full-length long Japan" (FLJ) collection of sequenced human cDNAs. We determined the entire sequence of 21,243 selected clones and found that 14,490 cDNAs (10,897 clusters) were unique to the FLJ collection. About half of them (5,416) seemed to be protein-coding. Of those, 1,999 clusters had not been predicted by computational methods. The distribution of GC content of nonpredicted cDNAs had a peak at ∼58% compared with a peak at ∼42%for predicted cDNAs. Thus, there seems to be a slight bias against GC-rich transcripts in current gene prediction procedures. The rest of the cDNAs unique to the FLJ collection (5,481) contained no obvious open reading frames (ORFs) and thus are candidate noncoding RNAs. About one-fourth of them (1,378) showed a clear pattern of splicing. The distribution of GC content of noncoding cDNAs was narrow and had a peak at ∼42%, relatively low compared with that of protein-coding cDNAs.
By analyzing 1,780,295 5Ј-end sequences of human full-length cDNAs derived from 164 kinds of oligo-cap cDNA libraries, we identified 269,774 independent positions of transcriptional start sites (TSSs) for 14,628 human RefSeq genes. These TSSs were clustered into 30,964 clusters that were separated from each other by more than 500 bp and thus are very likely to constitute mutually distinct alternative promoters. To our surprise, at least 7674 (52%) human RefSeq genes were subject to regulation by putative alternative promoters (PAPs). On average, there were 3.1 PAPs per gene, with the composition of one CpG-island-containing promoter per 2.6 CpG-less promoters. In 17% of the PAP-containing loci, tissue-specific use of the PAPs was observed. The richest tissue sources of the tissue-specific PAPs were testis and brain. It was also intriguing that the PAP-containing promoters were enriched in the genes encoding signal transduction-related proteins and were rarer in the genes encoding extracellular proteins, possibly reflecting the varied functional requirement for and the restricted expression of those categories of genes, respectively. The patterns of the first exons were highly diverse as well. On average, there were 7.7 different splicing types of first exons per locus partly produced by the PAPs, suggesting that a wide variety of transcripts can be achieved by this mechanism. Our findings suggest that use of alternate promoters and consequent alternative use of first exons should play a pivotal role in generating the complexity required for the highly elaborated molecular systems in humans.[Supplemental material is available online at www.genome.org. The sequence data from this study have been submitted to DDBJ under accession nos. DA000001-DA999999, DB000001-DB294747, DB294748-DB384947, BP192706-BP383670, AU279383-AU280837, and AU116788-U160826.]One of the most striking findings revealed by the Human Genome Project is that the human genome contains only 20,000-25,000 kinds of protein-coding genes (International Human Genome Sequencing Consortium 2004). This number is unexpectedly small compared with the total gene numbers in yeast, fly, and worm genomes, which are estimated to be 6,000, 14,000, and 19,000, respectively (Goffeau et al. 1996;C. elegans Sequencing Consortium 1998;Adams et al. 2000). It is supposed that there must be other factors in addition to mere gene numbers to satisfy the prerequisites that enable the human genome to fabricate such highly elaborated systems as the brain and immune systems. To explain this, it has been hypothesized that multifaceted use of the genes should play a pivotal role in functional
The complete sequence of the genome of a hyper-thermophilic archaebacterium, Pyrococcus horikoshii OT3, has been determined by assembling the sequences of the physical map-based contigs of fosmid clones and of long polymerase chain reaction (PCR) products which were used for gap-filling. The entire length of the genome was 1,738,505 bp. The authenticity of the entire genome sequence was supported by restriction analysis of long PCR products, which were directly amplified from the genomic DNA. As the potential protein-coding regions, a total of 2061 open reading frames (ORFs) were assigned, and by similarity search against public databases, 406 (19.7%) were related to genes with putative function and 453 (22.0%) to the sequences registered but with unknown function. The remaining 1202 ORFs (58.3%) did not show any significant similarity to the sequences in the databases. Sequence comparison among the assigned ORFs in the genome provided evidence that a considerable number of ORFs were generated by sequence duplication. By similarity search, 11 ORFs were assumed to contain the intein elements. The RNA genes identified were a single 16S-23S rRNA operon, two 5S rRNA genes and 46 tRNA genes including two with the intron structure. All the assigned ORFs and RNA coding regions occupied 91.25% of the whole genome. The data presented in this paper are available on the internet at http:@www.nite.go.jp.
The nuclear import of the spliceosomal snRNPs U1, U2, U4 and U5, is dependent on the presence of a complex nuclear localization signal (NLS). The latter is composed of the 5'-2,2,7-terminal trimethylguanosine (m3G) cap structure of the U snRNA and the Sm core domain. Here, we describe the isolation and cDNA cloning of a 45 kDa protein, termed snurportin1, which interacts specifically with m3G-cap but not m7G-cap structures. Snurportin1 enhances the m3G-capdependent nuclear import of U snRNPs in both Xenopus laevis oocytes and digitonin-permeabilized HeLa cells, demonstrating that it functions as an snRNP-specific nuclear import receptor. Interestingly, solely the m3G-cap and not the Sm core NLS appears to be recognized by snurportin1, indicating that at least two distinct import receptors interact with the complex snRNP NLS. Snurportin1 represents a novel nuclear import receptor which contains an N-terminal importin beta binding (IBB) domain, essential for function, and a C-terminal m3G-cap-binding region with no structural similarity to the arm repeat domain of importin alpha.
In order to elucidate roles of the 2'-O-methylation of pyrimidine nucleotide residues of tRNAs, conformations of 2'-O-methyluridylyl(3'----5')uridine (UmpU), 2'-O-methyluridine 3'-monophosphate (Ump), and 2'-O-methyluridine (Um) in 2H2O solution were analyzed by one- and two-dimensional proton NMR spectroscopy and compared with those of related nucleotides and nucleoside. As for UpU and UmpU, the 2'-O-methylation was found to stabilize the C3'-endo form of the 3'-nucleotidyl unit (Up-/Ump-moiety). This stabilization of the C3'-endo form is primarily due to an intraresidue effect, since the conformation of the 5'-nucleotidyl unit (-pU moiety) was only slightly affected by the 2'-O-methylation of the 3'-nucleotide unit. In fact even for Up and Ump, the 2'-O-methylation significantly stabilizes the C3'-endo form by 0.8 kcal/.mol-1. By contrast, for nucleosides (U and Um), the C3'-endo form is slightly stabilized by 0.1 kcal/.mol-1. Accordingly, the stabilization of the C3'-endo form by the 2'-O-methylation is primarily due to the steric repulsion among the 2-carbonyl group, the 2'-O-methyl group and the 3'-phosphate group in the C2'-endo form. For some tRNA species, 2-thiolation of pyrimidine residues is found in positions where the 2'-O-methylation is found for other tRNA species.(ABSTRACT TRUNCATED AT 250 WORDS)
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.