Despite the rapid accumulation of tumor-profiling data and transcription factor (TF) ChIP-seq profiles, efforts integrating TF binding with the tumor-profiling data to understand how TFs regulate tumor gene expression are still limited. To systematically search for cancerassociated TFs, we comprehensively integrated 686 ENCODE ChIPseq profiles representing 150 TFs with 7484 TCGA tumor data in 18 cancer types. For efficient and accurate inference on gene regulatory rules across a large number and variety of datasets, we developed an algorithm, RABIT (regression analysis with background integration). In each tumor sample, RABIT tests whether the TF target genes from ChIP-seq show strong differential regulation after controlling for background effect from copy number alteration and DNA methylation. When multiple ChIP-seq profiles are available for a TF, RABIT prioritizes the most relevant ChIP-seq profile in each tumor. In each cancer type, RABIT further tests whether the TF expression and somatic mutation variations are correlated with differential expression patterns of its target genes across tumors. Our predicted TF impact on tumor gene expression is highly consistent with the knowledge from cancer-related gene databases and reveals many previously unidentified aspects of transcriptional regulation in tumor progression. We also applied RABIT on RNAbinding protein motifs and found that some alternative splicing factors could affect tumor-specific gene expression by binding to target gene 3′UTR regions. Thus, RABIT (rabit.dfci.harvard.edu) is a general platform for predicting the oncogenic role of gene expression regulators.regulatory inference | tumor profiling | transcription factor |
RNA-binding proteinT umorigenesis is a multistep process requiring alterations in gene expression programs (1, 2). Transcription factors (TFs) are instrumental in driving these gene expression programs, and misregulation of these TFs can result in the acquisition of tumorrelated properties (3). For example, E2F1 is overexpressed in many cancer types and promotes tumor proliferation by regulating expression of genes involved in cell differentiation, metabolism, and development (4). As another example, FOXM1 plays an important role in promoting cell proliferation and cell cycle progression through transcriptional activation of many G2/M-specific genes. Increased FOXM1 gene expression was detected in numerous cancer types, and FOXM1 is a promising therapeutic target for cancer treatment (5). TFs also play critical roles in inducing the tumor microenvironment for metastasis. For example, SNAI1/2, TWIST, and ZEB1/2 orchestrate the expression of genes involved in cell polarity, cell-cell contact, cytoskeleton structure, and extracellular matrix degradation. The joint effect of these TFs promotes cancer cell motility and invasion in the metastatic process (2, 6).With the rapid development of high-throughput technologies, large amounts of datasets have been generated for regulatory proteins. For example, the ENCODE project generated 689 C...