A major challenge in cancer genomics is to identify genes with functional roles in cancer and uncover their mechanisms of action. Here, we introduce a unified analytical framework that enables rapid integration of multiple sources of information in order to identify cancer-relevant genes by pinpointing those whose interaction or other functional sites are enriched in somatic mutations across tumors. Our accompanying method PertInInt combines knowledge about sites participating in interactions with DNA, RNA, peptides, ions or small molecules with domain, evolutionary conservation and gene-level mutation data. When applied to 10,037 tumor samples across 33 cancer types, PertInInt uncovers both known and newly predicted cancer genes, while simultaneously revealing whether interaction potential or other functionalities are disrupted.PertInInt's analysis demonstrates that somatic mutations are frequently enriched in binding residues and domains in oncogenes and tumor suppressors, and implicates interaction perturbation as a pervasive cancer driving event.(Software at http://github.com/Singh-Lab/PertInInt.) Large-scale, concerted oncogenomic consortia have recently sequenced an unprecedented number 2 of tumor genomes from thousands of patients across tens of cancer types [1,2]. Analyses of these 3 datasets promise the opportunity for improved diagnosis and additional insights into the genetic 4 underpinnings of a staggeringly complex and heterogeneous disease [3]. More broadly, the 5 comprehensive detection of cancer-driving mutational events, coupled with a mechanistic 6 understanding of their functional impact, has the potential to expand our knowledge of altered 7 cellular processes in tumors, to reveal actionable, genetic similarities between different cancer 8 types, and to inform how evolving, heterogeneous populations of tumor cells may impact 9 therapeutic efficacy [4][5][6].
10A crucial first step toward these goals-differentiating the small fraction of somatic mutations 11 with functional roles in cancer ("drivers") from the preponderance of neutral "passenger" 12 mutations-still poses a substantial computational obstacle [7]. While initial attempts to uncover 13 cancer drivers at the gene level based on frequency of mutation across tumor samples have been 14 fruitful [8,9], such gene-centric, recurrence-based approaches are inherently unable to detect 15 infrequently mutated driver genes and also cannot distinguish amongst mutations within the same 16 gene that may lead to distinct tumor phenotypes or clinical responses [10]. In order to address the 17 critical need to detect and interpret rare mutational driver events at the subgene level [11], an 18 emerging class of approaches has begun to combine somatic mutation information with additional 19 knowledge regarding protein site functionality, derived from analyses of evolutionary 20 conservation [12][13][14], three-dimensional structure [15][16][17][18][19][20], domains [21,22], or post-translational 21 modification [23,24]. These methods, however, tend to...