The mission of the Encyclopedia of DNA Elements (ENCODE) Project is to enable the scientific and medical communities to interpret the human genome sequence and apply it to understand human biology and improve health. The ENCODE Consortium is integrating multiple technologies and approaches in a collective effort to discover and define the functional elements encoded in the human genome, including genes, transcripts, and transcriptional regulatory regions, together with their attendant chromatin states and DNA methylation patterns. In the process, standards to ensure high-quality data have been implemented, and novel algorithms have been developed to facilitate analysis. Data and derived results are made available through a freely accessible database. Here we provide an overview of the project and the resources it is generating and illustrate the application of ENCODE data to interpret the human genome.
The extent to which variation in chromatin structure and transcription factor binding may influence gene expression, and thus underlie or contribute to variation in phenotype, is unknown. To address this question, we cataloged both individual-to-individual variation and differences between homologous chromosomes within the same individual (allele-specific variation) in chromatin structure and transcription factor binding in lymphoblastoid cells derived from individuals of geographically diverse ancestry. Ten percent of active chromatin sites were individual-specific; a similar proportion were allele-specific. Both individual-specific and allele-specific sites were commonly transmitted from parent to child, which suggests that they are heritable features of the human genome. Our study shows that heritable chromatin status and transcription factor binding differ as a result of genetic variation and may underlie phenotypic variation in humans.
Cell-type diversity is governed in part by differential gene expression programs mediated by transcription factor (TF) binding. However, there are few systematic studies of the genomic binding of different types of TFs across a wide range of human cell types, especially in relation to gene expression. In the ENCODE Project, we have identified the genomic binding locations across 11 different human cell types of CTCF, RNA Pol II (RNAPII), and MYC, three TFs with diverse roles. Our data and analysis revealed how these factors bind in relation to genomic features and shape gene expression and cell-type specificity. CTCF bound predominantly in intergenic regions while RNAPII and MYC preferentially bound to core promoter regions. CTCF sites were relatively invariant across diverse cell types, while MYC showed the greatest celltype specificity. MYC and RNAPII co-localized at many of their binding sites and putative target genes. Cell-type specific binding sites, in particular for MYC and RNAPII, were associated with cell-type specific functions. Patterns of binding in relation to gene features were generally conserved across different cell types. RNAPII occupancy was higher over exons than adjacent introns, likely reflecting a link between transcriptional elongation and splicing. TF binding was positively correlated with the expression levels of their putative target genes, but combinatorial binding, in particular of MYC and RNAPII, was even more strongly associated with higher gene expression. These data illuminate how combinatorial binding of transcription factors in diverse cell types is associated with gene expression and cell-type specific biology.
Bromodomain proteins (BRD) are key chromatin regulators of genome function and stability as well as therapeutic targets in cancer. Here, we systematically delineate the contribution of human BRD proteins for genome stability and DNA double-strand break (DSB) repair using several cell-based assays and proteomic interaction network analysis. Applying these approaches, we identify 24 of the 42 BRD proteins as promoters of DNA repair and/or genome integrity. We identified a BRD-reader function of PCAF that bound TIP60-mediated histone acetylations at DSBs to recruit a DUB complex to deubiquitylate histone H2BK120, to allowing direct acetylation by PCAF, and repair of DSBs by homologous recombination. We also discovered the bromo-and-extra-terminal (BET) BRD proteins, BRD2 and BRD4, as negative regulators of transcription-associated RNA-DNA hybrids (R-loops) as inhibition of BRD2 or BRD4 increased R-loop formation, which generated DSBs. These breaks were reliant on topoisomerase II, and BRD2 directly bound and activated topoisomerase I, a known restrainer of R-loops. Thus, comprehensive interactome and functional profiling of BRD proteins revealed new homologous recombination and genome stability pathways, providing a framework to understand genome maintenance by BRD proteins and the effects of their pharmacological inhibition.
Understanding the relationships between regulatory factor binding, chromatin structure, cis-regulatory elements and RNA-regulation mechanisms relies on accurate information about transcription start sites (TSS) and polyadenylation sites (PAS). Although several approaches have identified transcript ends in yeast, limitations of resolution and coverage have remained, and definitive identification of TSS and PAS with single-nucleotide resolution has not yet been achieved. We developed SMORE-seq (simultaneous mapping of RNA ends by sequencing) and used it to simultaneously identify the strongest TSS for 5207 (90%) genes and PAS for 5277 (91%) genes. The new transcript annotations identified by SMORE-seq showed improved distance relationships with TATA-like regulatory elements, nucleosome positions and active RNA polymerase. We found 150 genes whose TSS were downstream of the annotated start codon, and additional analysis of evolutionary conservation and ribosome footprinting suggests that these protein-coding sequences are likely to be mis-annotated. SMORE-seq detected short non-coding RNAs transcribed divergently from more than a thousand promoters in wild-type cells under normal conditions. These divergent non-coding RNAs were less evident at promoters containing canonical TATA boxes, suggesting a model where transcription initiation at promoters by RNAPII is bidirectional, with TATA elements serving to constrain the directionality of initiation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.