The Encyclopedia of DNA Elements (ENCODE) Data Coordinating Center has developed the ENCODE Portal database and website as the source for the data and metadata generated by the ENCODE Consortium. Two principles have motivated the design. First, experimental protocols, analytical procedures and the data themselves should be made publicly accessible through a coherent, web-based search and download interface. Second, the same interface should serve carefully curated metadata that record the provenance of the data and justify its interpretation in biological terms. Since its initial release in 2013 and in response to recommendations from consortium members and the wider community of scientists who use the Portal to access ENCODE data, the Portal has been regularly updated to better reflect these design principles. Here we report on these updates, including results from new experiments, uniformly-processed data from other projects, new visualization tools and more comprehensive metadata to describe experiments and analyses. Additionally, the Portal is now home to meta(data) from related projects including Genomics of Gene Regulation, Roadmap Epigenome Project, Model organism ENCODE (modENCODE) and modERN. The Portal now makes available over 13000 datasets and their accompanying metadata and can be accessed at: https://www.encodeproject.org/.
The Saccharomyces Genome Database (SGD, http://www.yeastgenome.org) is the community resource for the budding yeast Saccharomyces cerevisiae. The SGD project provides the highest-quality manually curated information from peer-reviewed literature. The experimental results reported in the literature are extracted and integrated within a well-developed database. These data are combined with quality high-throughput results and provided through Locus Summary pages, a powerful query engine and rich genome browser. The acquisition, integration and retrieval of these data allow SGD to facilitate experimental design and analysis by providing an encyclopedia of the yeast genome, its chromosomal features, their functions and interactions. Public access to these data is provided to researchers and educators via web pages designed for optimal ease of use.
Sequence preferences of DNA-binding proteins are a primary mechanism by which cells interpret the genome. Despite these proteins' central importance in physiology, development, and evolution, comprehensive DNA-binding specificities have been determined experimentally for few proteins. Here, we used microarrays containing all 10-base-pair sequences to examine the binding specificities of 104 distinct mouse DNA-binding proteins representing 22 structural classes. Our results reveal a complex landscape of binding, with virtually every protein analyzed possessing unique preferences. Roughly half of the proteins each recognized multiple distinctly different sequence motifs, challenging our molecular understanding of how proteins interact with their DNA binding sites. This complexity in DNA recognition may be important in gene regulation and in evolution of transcriptional regulatory networks.The interactions between transcription factors (TFs) and their DNA binding sites are an integral part of the gene regulatory networks that control development, core cellular processes, and responses to environmental perturbations. However, only a handful of sequence-specific TFs have been characterized well enough to identify all the sequences that they can and, just as importantly, can not bind. Computational analysis of microarray readout of chromatin immunoprecipitation experiments (ChIP-chip) suggests extensive use of low affinity binding sites in yeast (1), and computational models of gene expression during fly embryonic development suggest that low affinity binding sites contribute as much as high affinity sites (2).The availability of TF binding data spanning the full affinity range would improve our understanding of the biophysical phenomena underlying protein-DNA recognition, and would improve accuracy in analyzing cis regulatory elements. Here we report the comprehensive determination of the DNA binding specificities of 104 known and predicted mouse TFs using the universal protein binding microarray (PBM) technology (3). These TFs represent 22 different DNA binding domain (DBD) structural classes that are the major DBD classes found in metazoan TFs.We created (4) N-terminal GST fusion constructs of the DBDs of 104 known and predicted mouse TFs (Fig. S1 and Table S1). Five of these proteins -Max, Bhlhb2, Gata3, Rfx3, and Sox7 -were also represented as full-length fusions to N-terminal GST, yielding a total set of 109 non-redundant proteins represented by 115 samples (5). Each protein was used in two PBM experiments (6,7) (Figs. S2, S3, S4 and Table S2). DNA binding site motifs initially were derived using the Seed-and-Wobble algorithm (3,8); Seed-and-Wobble first identifies the single 8-mer (ungapped or gapped) with the greatest PBM enrichment score (E-score) (3), and then systematically tests the relative preference of each nucleotide variant at each position both within and outside the seed (5). Later analyses incorporated additional motif finding algorithms, including RankMotif++ (9) and Kafal (5).Beyond simpl...
Most homeodomains are unique within a genome, yet many are highly conserved across vast evolutionary distances, implying strong selection on their precise DNA-binding specificities. We determined the binding preferences of the majority (168) of mouse homeodomains to all possible 8-base sequences, revealing rich and complex patterns of sequence specificity and showing that there are at least 65 distinct homeodomain DNA-binding activities. We developed a computational system that successfully predicts binding sites for homeodomain proteins as distant from mouse as Drosophila and C. elegans, and we infer full 8-mer binding profiles for the majority of known animal homeodomains. Our results provide an unprecedented level of resolution in the analysis of this simple domain structure and suggest that variation in sequence recognition may be a factor in its functional diversity and evolutionary success.
Summary The sequence specificity of DNA-binding proteins is the primary mechanism by which the cell recognizes genomic features. Here, we describe systematic determination of yeast transcription factor DNA-binding specificities. We obtained binding specificities for 112 DNA-binding proteins representing 19 distinct structural classes, one-third of which have not been previously reported. Several newly discovered binding sequences have striking genomic distributions relative to transcription start sites, supporting their biological relevance and suggesting a role in promoter architecture. Among these are Rsc3 binding sequences, containing the core CGCG, which are found preferentially ~100 bp upstream of transcription start sites. Mutation of RSC3 results in a dramatic increase in nucleosome occupancy in hundreds of proximal promoters containing a Rsc3 binding element, but has little impact on promoters lacking Rsc3 binding sequences, indicating that Rsc3 plays a broad role in targeting nucleosome exclusion at yeast promoters.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.