We report the generation and analysis of functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project. These data have been further integrated and augmented by a number of evolutionary and computational analyses. Together, our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another. Second, systematic examination of transcriptional regulation has yielded new understanding about transcription start sites, including their relationship to specific regulatory sequences and features of chromatin accessibility and histone modification. Third, a more sophisticated view of chromatin structure has emerged, including its inter-relationship with DNA replication and transcriptional regulation. Finally, integration of these new sources of information, in particular with respect to mammalian evolution based on inter- and intra-species sequence comparisons, has yielded new mechanistic and evolutionary insights concerning the functional landscape of the human genome. Together, these studies are defining a path for pursuit of a more comprehensive characterization of human genome function.
The mission of the Encyclopedia of DNA Elements (ENCODE) Project is to enable the scientific and medical communities to interpret the human genome sequence and apply it to understand human biology and improve health. The ENCODE Consortium is integrating multiple technologies and approaches in a collective effort to discover and define the functional elements encoded in the human genome, including genes, transcripts, and transcriptional regulatory regions, together with their attendant chromatin states and DNA methylation patterns. In the process, standards to ensure high-quality data have been implemented, and novel algorithms have been developed to facilitate analysis. Data and derived results are made available through a freely accessible database. Here we provide an overview of the project and the resources it is generating and illustrate the application of ENCODE data to interpret the human genome.
Diamond-Blackfan anemia (DBA), a congenital bone-marrow-failure syndrome, is characterized by red blood cell aplasia, macrocytic anemia, clinical heterogeneity, and increased risk of malignancy. Although anemia is the most prominent feature of DBA, the disease is also characterized by growth retardation and congenital anomalies that are present in approximately 30%-50% of patients. The disease has been associated with mutations in four ribosomal protein (RP) genes, RPS19, RPS24, RPS17, and RPL35A, in about 30% of patients. However, the genetic basis of the remaining 70% of cases is still unknown. Here, we report the second known mutation in RPS17 and probable pathogenic mutations in three more RP genes, RPL5, RPL11, and RPS7. In addition, we identified rare variants of unknown significance in three other genes, RPL36, RPS15, and RPS27A. Remarkably, careful review of the clinical data showed that mutations in RPL5 are associated with multiple physical abnormalities, including craniofacial, thumb, and heart anomalies, whereas isolated thumb malformations are predominantly present in patients carrying mutations in RPL11. We also demonstrate that mutations of RPL5, RPL11, or RPS7 in DBA cells is associated with diverse defects in the maturation of ribosomal RNAs in the large or the small ribosomal subunit production pathway, expanding the repertoire of ribosomal RNA processing defects associated with DBA.
We have identified an intergenic transcriptional activity that is located between the human HOXA1 and HOXA2 genes, shows myeloid-specific expression, and is upregulated during granulocytic differentiation. The novel gene, termed HOTAIRM1 (HOX antisense intergenic RNA myeloid 1), is transcribed antisense to the HOXA genes and originates from the same CpG island that embeds the start site of IntroductionHuman HOX gene clusters are known for the prevalence of intergenic transcription between coding genic members. 1 Similar activity has also been observed in other developmentally important or tissue-specific gene loci, such as those containing the human beta globin genes, cardiac myosin heavy chain genes, and the interleukin-4 (IL-4)/IL-13 gene cluster. [2][3][4] Extensive HOX gene cluster intergenic transcripts have been described largely as noncoding RNAs (ncRNAs), including both short microRNA (miRNA) species and long ncRNAs that are antisense to their canonical HOX neighbors. Well-defined HOX region ncRNAs include the mir-10 and mir-196 paralogs, bithoraxoid ncRNAs of the Drosophila bithorax complex, and human HOX antisense intergenic RNA (HOTAIR). [5][6][7] Intergenic regions have been proposed as locations for novel radiational and reorganizing changes that have occurred in the evolution of HOX gene clusters, which are relatively constrained in structure in the higher vertebrates. 5,8 Several recent studies have focused on expression of intergenic ncRNAs in the human HOX regions, especially the HOXA cluster, in tumor cell lines, tissues, and fibroblasts from different anatomic origins. All reported unusually active transcription within the intergenic regions, occurring in patterns coordinated with their HOX neighbors. 7,9,10 Intergenic ncRNAs in the HOXA gene cluster were usually associated with CpG islands and their expression accompanied changes in either polycomb group repressive complex binding or methylation of histones, suggesting a pattern of cis modulation of the intergenic transcripts before the activation of adjacent HOX genes. However, the HOTAIR transcript, located between HOXC11 and HOXC12, was found to function in trans to repress a distal group of homologous HOXD genes by demarcating an extended silenced domain through interaction with the polycomb group complex PRC2 histone methyltransferase 7,10,11 De novo genomic transcription mapping has revealed that intergenic ncRNA is possibly the most abundant form of transcriptional output from the genomes of humans and other higher eukaryotic organisms. 12,13 Within the human genome, the majority of intergenic ncRNA are not highly conserved at the sequence level, with long ncRNAs generally less conserved than short miRNAs. Nevertheless, their expression patterns may be conserved among tissues or along developmental axes. 14,15 More importantly, ncRNA function in gene regulation has emerged as an important mechanism in the control of many biologic processes in development and carcinogenesis. 16 In the present study, we have identified intergenic transc...
The gene that is abnormal in the X-linked form of the phagocytic disorder chronic granulomatous disease has been cloned without reference to a specific protein by relying on its chromosomal map position. The transcript of the gene is expressed in the phagocytic lineage of haematopoietic cells and is absent or structurally abnormal in four patients with the disorder. The nucleotide sequence of complementary DNA clones predicts a polypeptide of at least 468 amino acids with no homology to proteins described previously.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.