We present a major update of the HOCOMOCO collection that consists of patterns describing DNA binding specificities for human and mouse transcription factors. In this release, we profited from a nearly doubled volume of published in vivo experiments on transcription factor (TF) binding to expand the repertoire of binding models, replace low-quality models previously based on in vitro data only and cover more than a hundred TFs with previously unknown binding specificities. This was achieved by systematic motif discovery from more than five thousand ChIP-Seq experiments uniformly processed within the BioUML framework with several ChIP-Seq peak calling tools and aggregated in the GTRD database. HOCOMOCO v11 contains binding models for 453 mouse and 680 human transcription factors and includes 1302 mononucleotide and 576 dinucleotide position weight matrices, which describe primary binding preferences of each transcription factor and reliable alternative binding specificities. An interactive interface and bulk downloads are available on the web: http://hocomoco.autosome.ru and http://www.cbrc.kaust.edu.sa/hocomoco11. In this release, we complement HOCOMOCO by MoLoTool (Motif Location Toolbox, http://molotool.autosome.ru) that applies HOCOMOCO models for visualization of binding sites in short DNA sequences.
Models of transcription factor (TF) binding sites provide a basis for a wide spectrum of studies in regulatory genomics, from reconstruction of regulatory networks to functional annotation of transcripts and sequence variants. While TFs may recognize different sequence patterns in different conditions, it is pragmatic to have a single generic model for each particular TF as a baseline for practical applications. Here we present the expanded and enhanced version of HOCOMOCO (http://hocomoco.autosome.ru and http://www.cbrc.kaust.edu.sa/hocomoco10), the collection of models of DNA patterns, recognized by transcription factors. HOCOMOCO now provides position weight matrix (PWM) models for binding sites of 601 human TFs and, in addition, PWMs for 396 mouse TFs. Furthermore, we introduce the largest up to date collection of dinucleotide PWM models for 86 (52) human (mouse) TFs. The update is based on the analysis of massive ChIP-Seq and HT-SELEX datasets, with the validation of the resulting models on in vivo data. To facilitate a practical application, all HOCOMOCO models are linked to gene and protein databases (Entrez Gene, HGNC, UniProt) and accompanied by precomputed score thresholds. Finally, we provide command-line tools for PWM and diPWM threshold estimation and motif finding in nucleotide sequences.
The current version of the Gene Transcription Regulation Database (GTRD; http://gtrd.biouml.org ) contains information about: (i) transcription factor binding sites (TFBSs) and transcription coactivators identified by ChIP-seq experiments for Homo sapiens, Mus musculus, Rattus norvegicus, Danio rerio, Caenorhabditis elegans, Drosophila melanogaster, Saccharomyces cerevisiae, Schizosaccharomyces pombe and Arabidopsis thaliana ; (ii) regions of open chromatin and TFBSs (DNase footprints) identified by DNase-seq; (iii) unmappable regions where TFBSs cannot be identified due to repeats; (iv) potential TFBSs for both human and mouse using position weight matrices from the HOCOMOCO database. Raw ChIP-seq and DNase-seq data were obtained from ENCODE and SRA, and uniformly processed. ChIP-seq peaks were called using four different methods: MACS, SISSRs, GEM and PICS. Moreover, peaks for the same factor and peak calling method, albeit using different experiment conditions (cell line, treatment, etc.), were merged into clusters. To reduce noise, such clusters for different peak calling methods were merged into meta-clusters; these were considered to be non-redundant TFBS sets. Moreover, extended quality control was applied to all ChIP-seq data. Web interface to access GTRD was developed using the BioUML platform. It provides browsing and displaying information, advanced search possibilities and an integrated genome browser.
GTRD—Gene Transcription Regulation Database (http://gtrd.biouml.org)—is a database of transcription factor binding sites (TFBSs) identified by ChIP-seq experiments for human and mouse. Raw ChIP-seq data were obtained from ENCODE and SRA and uniformly processed: (i) reads were aligned using Bowtie2; (ii) ChIP-seq peaks were called using peak callers MACS, SISSRs, GEM and PICS; (iii) peaks for the same factor and peak callers, but different experiment conditions (cell line, treatment, etc.), were merged into clusters; (iv) such clusters for different peak callers were merged into metaclusters that were considered as non-redundant sets of TFBSs. In addition to information on location in genome, the sets contain structured information about cell lines and experimental conditions extracted from descriptions of corresponding ChIP-seq experiments. A web interface to access GTRD was developed using the BioUML platform. It provides: (i) browsing and displaying information; (ii) advanced search possibilities, e.g. search of TFBSs near the specified gene or search of all genes potentially regulated by a specified transcription factor; (iii) integrated genome browser that provides visualization of the GTRD data: read alignments, peaks, clusters, metaclusters and information about gene structures from the Ensembl database and binding sites predicted using position weight matrices from the HOCOMOCO database.
The Gene Transcription Regulation Database (GTRD; http://gtrd.biouml.org/) contains uniformly annotated and processed NGS data related to gene transcription regulation: ChIP-seq, ChIP-exo, DNase-seq, MNase-seq, ATAC-seq and RNA-seq. With the latest release, the database has reached a new level of data integration. All cell types (cell lines and tissues) presented in the GTRD were arranged into a dictionary and linked with different ontologies (BRENDA, Cell Ontology, Uberon, Cellosaurus and Experimental Factor Ontology) and with related experiments in specialized databases on transcription regulation (FANTOM5, ENCODE and GTEx). The updated version of the GTRD provides an integrated view of transcription regulation through a dedicated web interface with advanced browsing and search capabilities, an integrated genome browser, and table reports by cell types, transcription factors, and genes of interest.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.