Outcomes for cancer patients vary greatly even within the same tumor type, and characterization of molecular subtypes of cancer holds important promise for improving prognosis and personalized treatment. This promise has motivated recent efforts to produce large amounts of multidimensional genomic ('multi-omic') data, but current algorithms still face challenges in the integrated analysis of such data . Here we present Cancer Integration via Multikernel Learning (CIMLR; based on an algorithm originally developed for analysis of single-cell RNA-Seq data), a new cancer subtyping method that integrates multi-omic data to reveal molecular subtypes of cancer. We apply CIMLR to multi-omic data from 32 cancer types and show significant improvements in both computational efficiency and ability to extract biologically meaningful cancer subtypes. The discovered subtypes exhibit significant differences in patient survival for 21 of the 32 studied cancer types. Our analysis reveals integrated patterns of gene expression, methylation, point mutations and copy number peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/267245 doi: bioRxiv preprint first posted online Feb. 16, 2018; changes in multiple cancers and highlights patterns specifically associated with poor patient outcomes.
IntroductionCancer is a heterogeneous disease that evolves through many pathways, involving changes in the activity of multiple oncogenes and tumor suppressor genes. The basis for such changes is the vast number and diversity of somatic alterations that produce complex molecular and cellular phenotypes, ultimately influencing each individual tumor's behavior and response to treatment. Due to the diversity of mutations and molecular mechanisms, outcomes vary greatly and it is therefore important to identify cancer subtypes based on common molecular features, and then correlate those with outcomes. This will lead to an improved understanding of the pathways by which cancer commonly evolves, as well as better prognosis and personalized treatment.Efforts to distinguish subtypes are complicated by the many kinds of genomic changes that contribute to cancer -for example, point mutations, DNA copy number aberrations, DNA methylation, gene expression, protein levels, and post-translational modifications. While gene expression clustering has often been used to discover subtypes (e.g., the PAM50 subtypes 1 of breast cancer), analysis of a single data type does not typically capture the full complexity of a tumor genome and its molecular phenotypes.For example, a copy number change may be biologically relevant only if it causes a gene expression change; gene expression data alone ignores point mutations that may alter the function of the gene product; and point mutations in two different genes may have the same downstream effect, which may become apparent only when also considering methylation or gene expression. Therefore, comprehen...