The arrangement of transcription factor (TF) binding motifs (syntax) is an important part of the cis-regulatory code, yet remains elusive. We introduce a deep learning model, BPNet, that uses DNA sequence to predict base-resolution ChIP-nexus binding profiles of pluripotency TFs. We develop interpretation tools to learn predictive motif representations and identify soft syntax rules for cooperative TF binding interactions. Strikingly, Nanog preferentially binds with helical periodicity, and TFs often cooperate in a directional manner, which we validate using CRISPR-induced point mutations. Our model represents a powerful general approach to uncover the motifs and syntax of cis-regulatory sequences in genomics data.
Genes are regulated through enhancer sequences, in which transcription factor binding motifs and their specific arrangements (syntax) form a cis-regulatory code. To understand the relationship between motif syntax and transcription factor binding, we train a deep learning model that uses DNA sequence to predict base-resolution binding profiles of four pluripotency transcription factors Oct4, Sox2, Nanog, and Klf4. We interpret the model to accurately map hundreds of thousands of motifs in the genome, learn novel motif representations and identify rules by which motifs and syntax influence transcription factor binding. We find that instances of strict motif spacing are largely due to retrotransposons, but that soft motif syntax influences motif interactions at protein and nucleosome range. Most strikingly, Nanog binding is driven by motifs with a strong preference for ~10.5 bp spacings corresponding to helical periodicity. Interpreting deep learning models applied to high-resolution binding data is a powerful and versatile approach to uncover the motifs and syntax of cis-regulatory sequences.
The relative importance of regulation at the mRNA versus protein level is subject to ongoing debate. To address this question in a dynamic system, we mapped proteomic and transcriptomic changes in mammalian cells responding to stress induced by dithiothreitol over 30 h. Specifically, we estimated the kinetic parameters for the synthesis and degradation of RNA and proteins, and deconvoluted the response patterns into common and unique to each regulatory level using a new statistical tool. Overall, the two regulatory levels were equally important, but differed in their impact on molecule concentrations. Both mRNA and protein changes peaked between two and eight hours, but mRNA expression fold changes were much smaller than those of the proteins. mRNA concentrations shifted in a transient, pulse‐like pattern and returned to values close to pre‐treatment levels by the end of the experiment. In contrast, protein concentrations switched only once and established a new steady state, consistent with the dominant role of protein regulation during misfolding stress. Finally, we generated hypotheses on specific regulatory modes for some genes.
doi: bioRxiv preprint first posted online Nov. 26, 2015; 2 Standfirst textUsing a new statistical tool to analyze time-series protein and matching mRNA concentration data, this study deconvoluted the contributions of mRNA and protein level regulation in the response of mammalian cells to stress of the endoplasmatic reticulum.-We quantified protein and mRNA concentrations for 3,235 genes across two replicates and time points, with a high-confidence dataset of 1,237 genes/mRNAs.-We use a new statistical tool to quantify the contribution of regulatory processes, and we find that mRNA and protein level regulation play similarly important roles.-mRNA and protein level regulation have different dynamics: mRNA concentrations spike in their change and return to pre-perturbation levels, while protein concentrations switch in their behavior and reach a new steady-state.-We generated hypotheses on modes of regulation for several groups of genes.All rights reserved. No reuse allowed without permission.(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.Thecopyright holder for this preprint . http://dx.doi.org/10.1101/032797 doi: bioRxiv preprint first posted online Nov. 26, 2015; 3 AbstractThe relative importance of regulation at the mRNA versus protein level is subject to ongoing debate. To address this question in a dynamic system, we mapped the proteomics and transcriptomics changes in mammalian cells responding to stress induced by dithiothreitol over 30 hours. Specifically, we estimated the kinetic parameters for synthesis and degradation of RNA and proteins, and deconvoluted response patterns common and unique to each regulatory level using a new statistical tool. Overall, both regulatory levels were equally important, but differed in their impact on molecule concentrations. Both mRNA and protein changes peaked between two and eight hours, but mRNA expression fold changes were much smaller than those of the proteins. Further, mRNA concentrations were regulated in a transient, spike-like pattern and returned to values close to pre-treatment levels by the end of the experiment. In contrast, protein concentrations switched only once and established a new steady state, consistent with the dominant role of protein regulation during misfolding stress. Finally, we generated hypotheses on specific regulatory modes for example groups of genes. Words: 173 (of 175 max)All rights reserved. No reuse allowed without permission.(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.The copyright holder for this preprint . http://dx
Chromatin accessibility is integral to the process by which transcription factors (TFs) read out cis-regulatory DNA sequences, but it is difficult to differentiate between TFs that drive accessibility and those that do not. Deep learning models that learn complex sequence rules provide an unprecedented opportunity to dissect this problem. Using zygotic genome activation in the Drosophila embryo as a model, we generated high-resolution TF binding and chromatin accessibility data, analyzed the data with interpretable deep learning, and performed genetic experiments for validation. We uncover a clear hierarchical relationship between the pioneer TF Zelda and the TFs involved in axis patterning. Zelda consistently pioneers chromatin accessibility proportional to motif affinity, while patterning TFs augment chromatin accessibility in sequence contexts in which they mediate enhancer activation. We conclude that chromatin accessibility occurs in two phases: one through pioneering, which makes enhancers accessible but not necessarily active, and a second when the correct combination of transcription factors leads to enhancer activation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.