The mission of the Encyclopedia of DNA Elements (ENCODE) Project is to enable the scientific and medical communities to interpret the human genome sequence and apply it to understand human biology and improve health. The ENCODE Consortium is integrating multiple technologies and approaches in a collective effort to discover and define the functional elements encoded in the human genome, including genes, transcripts, and transcriptional regulatory regions, together with their attendant chromatin states and DNA methylation patterns. In the process, standards to ensure high-quality data have been implemented, and novel algorithms have been developed to facilitate analysis. Data and derived results are made available through a freely accessible database. Here we provide an overview of the project and the resources it is generating and illustrate the application of ENCODE data to interpret the human genome.
As studies of DNA methylation increase in scope, it has become evident that methylation has a complex relationship with gene expression, plays an important role in defining cell types, and is disrupted in many diseases. We describe large-scale single-base resolution DNA methylation profiling on a diverse collection of 82 human cell lines and tissues using reduced representation bisulfite sequencing (RRBS). Analysis integrating RNA-seq and ChIP-seq data illuminates the functional role of this dynamic mark. Loci that are hypermethylated across cancer types are enriched for sites bound by NANOG in embryonic stem cells, which supports and expands the model of a stem/progenitor cell signature in cancer. CpGs that are hypomethylated across cancer types are concentrated in megabase-scale domains that occur near the telomeres and centromeres of chromosomes, are depleted of genes, and are enriched for cancerspecific EZH2 binding and H3K27me3 (repressive chromatin). In noncancer samples, there are cell-type specific methylation signatures preserved in primary cell lines and tissues as well as methylation differences induced by cell culture. The relationship between methylation and expression is context-dependent, and we find that CpG-rich enhancers bound by EP300 in the bodies of expressed genes are unmethylated despite the dense gene-body methylation surrounding them. Non-CpG cytosine methylation occurs in human somatic tissue, is particularly prevalent in brain tissue, and is reproducible across many individuals. This study provides an atlas of DNA methylation across diverse and well-characterized samples and enables new discoveries about DNA methylation and its role in gene regulation and disease.
CTCF is a ubiquitously expressed regulator of fundamental genomic processes including transcription, intra- and interchromosomal interactions, and chromatin structure. Because of its critical role in genome function, CTCF binding patterns have long been assumed to be largely invariant across different cellular environments. Here we analyze genome-wide occupancy patterns of CTCF by ChIP-seq in 19 diverse human cell types, including normal primary cells and immortal lines. We observed highly reproducible yet surprisingly plastic genomic binding landscapes, indicative of strong cell-selective regulation of CTCF occupancy. Comparison with massively parallel bisulfite sequencing data indicates that 41% of variable CTCF binding is linked to differential DNA methylation, concentrated at two critical positions within the CTCF recognition sequence. Unexpectedly, CTCF binding patterns were markedly different in normal versus immortal cells, with the latter showing widespread disruption of CTCF binding associated with increased methylation. Strikingly, this disruption is accompanied by up-regulation of CTCF expression, with the result that both normal and immortal cells maintain the same average number of CTCF occupancy sites genome-wide. These results reveal a tight linkage between DNA methylation and the global occupancy patterns of a major sequence-specific regulatory factor.
Summary Most human transcription factors bind a small subset of potential genomic sites and often use different subsets in different cell types. To identify mechanisms that govern cell type-specific transcription factor binding, we used an integrative approach to study estrogen receptor α (ER). We found that ER exhibits two distinct modes of binding. Shared sites, bound in multiple cell types, are characterized by high affinity estrogen response elements (EREs), inaccessible chromatin and a lack of DNA methylation, while cell-specific sites are characterized by a lack of EREs, co-occurrence with other transcription factors and cell type-specific chromatin accessibility and DNA methylation. These observations enabled accurate quantitative models of ER binding that suggest tethering of ER to one-third of cell-specific sites. The distinct properties of cell-specific binding were also observed with glucocorticoid receptor and for ER in primary mouse tissues, representing an elegant genomic encoding scheme for generating cell type-specific gene regulation.
The closely linked human protocadherin (Pcdh) α, β, and γ gene clusters encode 53 distinct protein isoforms, which are expressed in a combinatorial manner to generate enormous diversity on the surface of individual neurons. This diversity is a consequence of stochastic promoter choice and alternative pre-mRNA processing. Here, we show that Pcdhα promoter choice is achieved by DNA looping between two downstream transcriptional enhancers and individual promoters driving the expression of alternate Pcdhα isoforms. In addition, we show that this DNA looping requires specific binding of the CTCF/cohesin complex to two symmetrically aligned binding sites in both the transcriptionally active promoters and in one of the enhancers. These findings have important implications regarding enhancer/promoter interactions in the generation of complex Pcdh cell surface codes for the establishment of neuronal identity and self-avoidance in individual neurons.T he clustered protocadherin (Pcdh) genes are expressed in the nervous system and organized into three closely linked clusters (α, β, and γ) (1-5). The human Pcdhα gene cluster contains 13 highly similar variable first exons (α1 to α13) arrayed in tandem and two more distantly related c-type variable first exons designated αc1 and αc2 (Fig. 1A). The variable first exons encode the extracellular, transmembrane, and juxtamembrane intracellular domains of the Pcdhα proteins. Each of these 15 variable first exons is cis-spliced to a single set of three downstream constant exons that encode a distal intracellular domain (1-3). The human β cluster is located downstream from the α cluster and contains a tandem array of 16 highly similar variable exons but with no constant exons, whereas the γ cluster contains 22 variable first exons arrayed in tandem and divided into three types (γa1 to γa12, γb1 to γb7, and γc3 to γc5) (Fig. 1D). As in the case of the α cluster, each of these 22 γ variable first exons is cis-spliced to a single set of three downstream constant exons, which are distinct from the α constant exons, to generate diverse γ mRNAs (1, 3, 6). Analyses of the α and γ transcripts have revealed that highly similar Pcdh alternate isoforms are expressed in a stochastic fashion, whereas all of the c-type divergent isoforms, αc1 and αc2 in the α cluster and γc3, γc4, and γc5 in the γ cluster, are expressed ubiquitously in all cells (1-3, 5, 7). Hereafter, we refer to the c-type genes as "ubiquitously expressed" in contrast to the "alternately expressed" Pcdh genes ( Fig. 1 A and D). A combination of stochastic activation of alternate promoters and constitutive activation of c-type ubiquitous promoters generates enormous single-cell diversity on the surface of individual neurons.Significant advances have been made in understanding the mechanisms by which individual neurons express distinct combinations of the clustered Pcdh genes (2, 3, 8-10). Two long-range cis-regulatory elements in the α cluster, HS5-1 and HS7 (hypersensitive sites 5-1 and 7), function as developmental and tissue...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.