SUMMARY
Although DNA modifications play an important role in gene regulation, the underlying mechanisms remain elusive. We developed EpiSELEX-seq to probe the sensitivity of transcription factor binding to DNA modification in vitro using massively parallel sequencing. Feature-based modeling quantifies the effect of cytosine methylation (5mC) on binding free energy in a position-specific manner. Application to the human bZIP proteins ATF4 and C/EBPβ, and three different Pbx-Hox complexes shows that 5mCpG can both increase and decrease affinity, depending on where the modification occurs within the protein-DNA interface. The TF paralogs tested vary in their methylation sensitivity, for which we provide a structural rationale. We show that 5mCpG can also enhance in vitro p53 binding, and provide evidence for increased in vivo p53 occupancy at methylated binding sites, correlating with primed-enhancer histone marks. Our results establish a powerful strategy for dissecting epigenomic modulation of protein-DNA interactions and their role in gene regulation.
Haploinsufficiency is a major mechanism of genetic risk in developmental disorders. Accurate prediction of haploinsufficient genes is essential for prioritizing and interpreting deleterious variants in genetic studies. Current methods based on mutation intolerance in population data suffer from inadequate power for genes with short transcripts. Here we show haploinsufficiency is strongly associated with epigenomic patterns, and develop a computational method (Episcore) to predict haploinsufficiency leveraging epigenomic data from a broad range of tissue and cell types by machine learning methods. Based on data from recent exome sequencing studies on developmental disorders, Episcore achieves better performance in prioritizing likely-gene-disrupting (LGD) de novo variants than current methods. We further show that Episcore is less-biased by gene size, and complementary to mutation intolerance metrics for prioritizing LGD variants. Our approach enables new applications of epigenomic data and facilitates discovery and interpretation of novel risk variants implicated in developmental disorders.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.