Transcription factors are DNA-binding proteins that have key roles in gene regulation 1,2. Genome-wide occupancy maps of transcriptional regulators are important for understanding gene regulation and its effects on diverse biological processes 3-6. However, only a minority of the more than 1,600 transcription factors encoded in the human genome has been assayed. Here we present, as part of the ENCODE (Encyclopedia of DNA Elements) project, data and analyses from chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) experiments using the human HepG2 cell line for 208 chromatinassociated proteins (CAPs). These comprise 171 transcription factors and 37 transcriptional cofactors and chromatin regulator proteins, and represent nearly one-quarter of CAPs expressed in HepG2 cells. The binding profiles of these CAPs form major groups associated predominantly with promoters or enhancers, or with both. We confirm and expand the current catalogue of DNA sequence motifs for transcription factors, and describe motifs that correspond to other transcription factors that are co-enriched with the primary ChIP target. For example, FOX family motifs are enriched in ChIP-seq peaks of 37 other CAPs. We show that motif content and occupancy patterns can distinguish between promoters and enhancers. This catalogue reveals high-occupancy target regions at which many CAPs associate, although each contains motifs for only a minority of the numerous associated transcription factors. These analyses provide a more complete overview of the gene regulatory networks that define this cell type, and demonstrate the usefulness of the large-scale production efforts of the ENCODE Consortium. There are an estimated 1,639 transcription factors (TFs) in the human genome 2 , and up to 2,500 CAPs when we include transcriptional cofactors, RNA polymerase-associated proteins, histone-binding regulators, and chromatin-modifying enzymes 1,7. A typical TF binds to a short DNA sequence motif, and, in vivo, some TFs exhibit additional chromosomal occupancy mediated by their interactions with other CAPs 8-10. CAPs are vital for orchestrating cell type-and cell state-specific gene regulation, including the temporal coordination of gene expression in developmental processes, environmental responses, and disease states 3-6,11-13. Identifying genomic regions with which a TF is physically associated, referred to as TF binding sites (TFBSs), is an important step towards understanding its biological roles. The most common genome-wide assay for identifying TFBSs is ChIP-seq 14-16. In addition to highlighting potentially active regulatory DNA elements by direct measurement, ChIP-seq data can define DNA sequence motifs that can be used, often in conjunction with expression data and chromatin accessibility maps, to infer likely binding events in other cellular contexts without performing direct assays. Although motifs identified by ChIP-seq are often representative of direct binding, this is not always the case, as co-occurrence of other TFs could ...
In mammalian embryogenesis differential gene expression gradually builds the identity and complexity of each tissue and organ system. We systematically quantified mouse polyA-RNA from embryo day E10.5 to birth, sampling 17 whole tissues, enhanced with single-cell measurements for the developing limb. The resulting developmental transcriptome is globally structured by dynamic cytodifferentiation, body-axis and cell-proliferation gene sets, characterized by their promoters' transcription factor (TF) motif codes. We decomposed the tissue-level transcriptome using scRNA-seq and found that neurogenesis and haematopoiesis dominate at both the gene and cellular levels, jointly accounting for 1/3 of differential gene expression and over 40% of identified cell types. Integrating promoter sequence motifs with companion ENCODE epigenomic profiles identified a promoter de-repression mechanism unique to neuronal expression clusters and attributable to known and novel repressors. Focusing on the developing limb, scRNA-seq identified 25 known and candidate novel cell types, including progenitor and differentiating states with computationally inferred lineage relationships. We extracted cell type TF networks and complementary sets of candidate enhancer elements by de-convolving whole-tissue IDEAS epigenome chromatin state models. These ENCODE reference data, computed network components and IDEAS chromatin segmentations, are companion resources to the matching epigenomic developmental matrix, available for researchers to further mine and integrate..
Service Email Alerting click here. top right corner of the article or Receive free email alerts when new articles cite this article-sign up in the box at the http://genome.cshlp.org/subscriptions
DNA associated proteins (DAPs) regulate gene expression by binding to regulatory loci such as enhancers or promoters. An understanding of how DAPs cooperate at regulatory loci is essential to deciphering how these regions contribute to normal development and disease. In this study, we aggregated publicly available ChIP-seq data from 469 human DNA-associated proteins assayed in three cell lines and integrated these data with an orthogonal dataset of 352 non-redundant, in vitro-derived motifs mapped to the genome within DNase hypersensitivity footprints in an effort to characterize regions of the genome that have exceptionally high numbers of DAP associations. We subsequently performed a massively parallel mutagenesis assay to discover the key sequence elements driving transcriptional activity at these loci and explored plausible biological mechanisms underlying their formation. We establish a generalizable definition for High Occupancy Target (HOT) loci and identify putative driver DAP motifs, including HNF4A, SP1, SP5, and ETV4, that are highly prevalent and exhibit sequence conservation at HOT loci. We also found the number of DAP associations is positively associated with evidence of regulatory activity and, by systematically mutating 245 HOT loci in our massively parallel reporter assay, localize regulatory activity in these loci to a central core region that is dependent on the motif sequences of our previously nominated driver DAPs. In sum, our work leverages the increasingly large number of DAP motif and ChIP-seq data publicly available to explore how DAP associations contribute to genome-wide transcriptional regulation. 3 provided an increasingly rich set of clues to the locations and physical connections among such elements. Nevertheless, these biochemical signatures cannot yet accurately predict the presence or amount of regulatory activity of the underlying DNA. There are many known and suspected reasons for this difficulty, including the relative strength, number of interacting partners, and redundancy of each element, each of which may modulate a locus' contribution to the native expression level(s) of its respective target gene(s) in a manner difficult to predict without direct experimentation (Roadmap Epigenomics Consortium 2015; The ENCODE Project Consortium 2007Sanyal et al. 2012). In this manuscript, we present evidence that the total number of DNA-associated proteins (DAPs) that associate with a locus can act as a quantitative predictor of the locus' regulatory activity and that the activities of loci with large numbers of DAP associations can be disrupted in a predictable manner by altering subsets of putative "driver motifs".Classically, regulatory loci are thought to be discriminately bound by a small subset of expressed transcription factors in a manner governed by each factor's DNA sequence preference, and additional proteins are recruited through specific protein-protein interactions (Mitchell and Tjian 1989). However, this model is becoming incongruent with observed DAP
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.