Motivation: The Brain and Muscle ARNTL-Like 1 protein (BMAL1) forms a heterodimer with either Circadian Locomotor Output Cycles Kaput (CLOCK) or Neuronal PAS domain protein 2 (NPAS2) to act as a master regulator of the mammalian circadian clock gene network. The dimer binds to E-box gene regulatory elements, activating downstream transcription of clock genes. Identification of transcription factor binding sites and features that correlate to DNA binding by BMAL1 is a challenging problem, given that CLOCK-BMAL1 or NPAS2-BMAL1 bind to several distinct binding motifs (CANNTG) on DNA.
Results: Using three different types of tissue-specific machine learning models with features based on 1) DNA sequence, 2) DNA sequence plus DNA shape, and 3) DNA sequence and shape plus histone modifications, we developed an interpretable predictive model of genome-wide BMAL1 binding to E-box motifs and dissected the mechanisms underlying BMAL1-DNA binding. Our results indicated that histone modifications, the local shape of the DNA, and the flanking sequence of the E-box motif are sufficient predictive features for BMAL1-DNA binding. Our models also provide mechanistic insights into tissue specificity of DNA binding by BMAL1.