21We developed Lisa (http://lisa.cistrome.org) to predict the transcriptional regulators (TRs) of 22 differentially expressed or co-expressed gene sets. Based on the input gene sets, Lisa first uses 23 compendia of public histone mark ChIP-seq and chromatin accessibility profiles to construct a 24 chromatin model related to the regulation of these genes. Then using TR ChIP-seq peaks or 25 imputed TR binding sites, Lisa probes the chromatin models using in silico deletion to find the 26 most relevant TRs. Applied to gene sets derived from targeted TF perturbation experiments, Lisa 27 boosted the performance of imputed TR cistromes, and outperformed alternative methods in 28 identifying the perturbed TRs. 29 30 Keywords 31 Transcription factors, gene regulation, chromatin accessibility, DNase-seq, H3K27ac ChIP-seq, 32 differential gene expression, gene set analysis 33 34 List of abbreviations 35 TF: transcription factor 36 CR: chromatin regulator 37 TR: transcriptional regulator 38 RP: regulatory potential 39 ISD: in silico deletion 40 ROC: receiver operator characteristic 41 AUC: area under curve 42 ChIP-seq: chromatin immunoprecipitation followed by DNA sequencing 43 DNase-seq: DNase I digestion followed by DNA sequencing 44 H3K27ac: histone H3 lysine 27 acetylation 45 AR: Androgen Receptor 46 ER: Estrogen Receptor 47 GR: Glucocorticoid Receptor 48 49 Introduction 50Transcriptional regulators (TRs), which include transcription factors (TFs) and chromatin 51 regulators (CRs), play essential roles in controlling normal biological processes and are frequently 52 implicated in disease [1][2][3][4] . The genomic landscape of TF binding sites and histone modifications 53 collectively shape the transcriptional regulatory environments of genes 5-8 . ChIP-seq has been 54 widely used to map the genome-wide set of cis-elements bound by trans-acting factors such as 55 TFs and CRs, which we henceforth refer to as "cistromes" 9 . There are approximately 1,500 56 transcription factors in human and mouse 10,11 , regulating a wide variety of biological processes in 57 constitutive or cell-type-specific manners, and tens of thousands of ChIP-seq and DNase-seq 58 experiments have been performed in human and mouse. We previously developed the Cistrome 59 Data Browser (DB) 12 , a collection of uniformly processed TF ChIP-seq (~11,000) and chromatin 60 profiles (~12,000 histone mark ChIP-seq and DNase-seq) in human and mouse.
62The question we address in this paper is how to effectively use these data to infer the TRs that 63 regulate a query gene set derived from differential or correlated gene expression analyses in 64 human or mouse. TR ChIP-seq data, when available, is the most accurate available data type 65 representing TR binding. ChIP-seq data availability, in terms of covered TRs and cell types, even 66 with large contributions from projects such as ENCODE 13 , is still sparse due to the limited 67 availability of specific antibodies. Although advances have been made in TR cistrome mapping 68 with the introduction of technolo...