Large-scale integrated cancer genome characterization efforts including the cancer genome atlas and the cancer cell line encyclopedia have created unprecedented opportunities to study cancer biology in the context of knowing the entire catalog of genetic alterations. A clinically important challenge is to discover cancer subtypes and their molecular drivers in a comprehensive genetic context. Curtis et al. [Nature (2012) 486(7403):346-352] has recently shown that integrative clustering of copy number and gene expression in 2,000 breast tumors reveals novel subgroups beyond the classic expression subtypes that show distinct clinical outcomes. To extend the scope of integrative analysis for the inclusion of somatic mutation data by massively parallel sequencing, we propose a framework for joint modeling of discrete and continuous variables that arise from integrated genomic, epigenomic, and transcriptomic profiling. The core idea is motivated by the hypothesis that diverse molecular phenotypes can be predicted by a set of orthogonal latent variables that represent distinct molecular drivers, and thus can reveal tumor subgroups of biological and clinical importance. Using the cancer cell line encyclopedia dataset, we demonstrate our method can accurately group cell lines by their cell-of-origin for several cancer types, and precisely pinpoint their known and potential cancer driver genes. Our integrative analysis also demonstrates the power for revealing subgroups that are not lineage-dependent, but consist of different cancer types driven by a common genetic alteration. Application of the cancer genome atlas colorectal cancer data reveals distinct integrated tumor subtypes, suggesting different genetic pathways in colon cancer progression.A major goal of many cancer genome projects is to characterize key genetic alterations in cancer and discover therapeutic targets through comprehensive genomic profiling of the cancer genome. The Cancer Genome Atlas (TCGA) studies have unveiled the genetic landscape of several cancer types by whole-genome and whole-exome sequencing, DNA copy number profiling, promoter methylation profiling, and mRNA expression profiling in a large number of tumors (1-5). Complementary to the tumor project, the Cancer Cell Line Encyclopedia (CCLE) (6) and the Sanger cell line project (7) has cataloged a compilation of genetic and molecular data in almost 1,000 human cancer cell lines, coupled with pharmacological profiles for a large panel of anticancer drugs. These large-scale integrative genomic efforts have been geared toward comprehensively cataloging individual genomic alterations, analogous to a reverse-engineering process where thousands of individual cancer genomes are taken apart to shed light on common biological principles. Unfortunately, cancer genomes exhibit considerable heterogeneity with abnormalities occurring in different genes among different individuals, posing a great challenge to identify those genes with functional importance and therapeutic implications. Thus, there is a...