Motivation
Accurate classification of patients into molecular subgroups is critical for the development of effective therapeutics and for deciphering what drives these subgroups to cancer. The availability of multi-omics data catalogs for large cohorts of cancer patients provides multiple views into the molecular biology of the tumors with unprecedented resolution.
Results
We develop PAMOGK (Pathway based Multi Omic Graph Kernel clustering) that integrates multi-omics patient data with existing biological knowledge on pathways. We develop a novel graph kernel that evaluates patient similarities based on a single molecular alteration type in the context of a pathway. To corroborate multiple views of patients evaluated by hundreds of pathways and molecular alteration combinations, we use multi-view kernel clustering. Applying PAMOGK to kidney renal clear cell carcinoma (KIRC) patients results in four clusters with significantly different survival times (p-value = 1.24e-11). When we compare PAMOGK to eight other state-of-the-art multi-omics clustering methods, PAMOGK consistently outperforms these in terms of its ability to partition KIRC patients into groups with different survival distributions. The discovered patient subgroups also differ with respect to other clinical parameters such as tumor stage and grade, and primary tumor and metastasis tumor spreads. The pathways identified as important are highly relevant to KIRC.
Availability
github.com/tastanlab/pamogk
Supplementary information
Supplementary data are available at Bioinformatics online.
Yasin Ilkagan Tepeli 1[0000−0002−3375−6678] , Ali BurakÜnal 2,3[0000−0002−7279−620X] , Furkan Mustafa Akdemir 3[0000−0003−0948−5756] , and Oznur Tastan 1[0000−0001−7058−5372]Abstract. Accurate classification of patients into homogeneous molecular subgroups is critical for the development of effective therapeutics and for deciphering what drives these different subtypes to cancer. However, the extensive molecular heterogeneity observed among cancer patients presents a challenge. The availability of multi-omic data catalogs for large cohorts of cancer patients provides multiple views into the molecular biology of the tumors with unprecedented resolution. In this work, we develop PAMOGK, which integrates multi-omics patient data and incorporates the existing knowledge on biological pathways. PAMOGK is well suited to deal with the sparsity of alterations in assessing patient similarities. We develop a novel graph kernel which we denote as smoothed shortest path graph kernel, which evaluates patient similarities based on a single molecular alteration type in the context of pathway. To corroborate multiple views of patients evaluated by hundreds of pathways and molecular alteration combinations, PAMOGK uses multi-view kernel clustering. We apply PAMOGK to find subgroups of kidney renal clear cell carcinoma (KIRC) patients, which results in four clusters with significantly different survival times (pvalue = 7.4e-10). The patient subgroups also differ with respect to other clinical parameters such as tumor stage and grade, and primary tumor and metastasis tumor spreads. When we compare PAMOGK to 8 other state-of-the-art existing multi-omics clustering methods, PAMOGK consistently outperforms these in terms of its ability to partition patients into groups with different survival distributions. PAMOGK enables extracting the relative importance of pathways and molecular data types. PAMOGK is available at github.com/tastanlab/pamogk
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.