Plants respond to their environment by dynamically modulating gene expression. A powerful approach for understanding how these responses are regulated is to integrate information about cis-regulatory elements (CREs) into models called cis-regulatory codes. Transcriptional response to combined stress is typically not the sum of the responses to the individual stresses. However, cis-regulatory codes underlying combined stress response have not been established. Here we modeled transcriptional response to single and combined heat and drought stress in Arabidopsis thaliana. We grouped genes by their pattern of response (independent, antagonistic, synergistic) and trained machine learning models to predict their response using putative CREs (pCREs) as features (median F-measure = 0.64). We then developed a deep learning approach to integrate additional omics information (sequence conservation, chromatin accessibility, histone modification) into our models, improving performance by 6.2%. While pCREs important for predicting independent and antagonistic responses tended to resemble binding motifs of transcription factors associated with heat and/or drought stress, important synergistic pCREs resembled binding motifs of transcription factors not known to be associated with stress. These findings demonstrate how in silico approaches can improve our understanding of the complex codes regulating response to combined stress and help us identify prime targets for future characterization.in nature and elicit both overlapping and conflicting physiological responses in plants (36). Moreover, TFs and TF binding motifs are known for these stresses individually (37,38). To better understand the regulatory logic underlying single and combined stress, first, we grouped genes likely to be co-regulated based on their shared pattern of transcriptional response under single and combined heat and drought stress (14) (Step 1, Fig. 1). Then, we used known TFBMs and enrichment based pCREs (Step 2, Fig. 1) to generate models of the cis-regulatory codes controlling these different patterns of responses to single and combined heat and drought stress using machine learning. To improve our models of the cisregulatory codes and therefore our understanding of how response to single and combined stress is regulated in A. thaliana, we modeled regulatory interactions (Step 3A, Fig. 1), used a deep learning approach to integrate additional omics information (i.e. chromatin accessibility, sequence conservation, and histone marks) into our models (Step 3B, Fig. 1), and expanded the scope of our models by including pCREs identified outside of the promoter region ( Step 3C, Fig. 1). In addition to providing a comprehensive overview of the cis-regulatory codes of response to single and combined heat and drought stress in A. thaliana, this study also exemplifies how a data-driven approach can be used to make novel discoveries in a complex system like gene regulation (Step 4, Fig.1).
MATERIALS AND METHODS
Expression data processing, response group classificat...