The increasing availability of network data is creating a great potential for knowledge discovery from graph data. In many applications, feature vectors are given in addition to graph data, where nodes represent entities, edges relationships between entities, and feature vectors associated with the nodes represent properties of entities. Often features and edges contain complementary information. In such scenarios the simultaneous use of both data types promises more meaningful and accurate results. Along these lines, we introduce the novel problem of mining cohesive patterns from graphs with feature vectors, which combines the concepts of dense subgraphs and subspace clusters into a very expressive problem definition. A cohesive pattern is a dense and connected subgraph that has homogeneous values in a large enough feature subspace. We argue that this problem definition is natural in identifying small communities in social networks and functional modules in Protein-Protein interaction networks. We present the algorithm CoPaM (Cohesive Pattern Miner), which exploits various pruning strategies to efficiently find all maximal cohesive patterns. Our theoretical analysis proves the correctness of CoPaM, and our experimental evaluation demonstrates its effectiveness and efficiency.
Motivation: Recent genomic studies have confirmed that cancer is of utmost phenotypical complexity, varying greatly in terms of subtypes and evolutionary stages. When classifying cancer tissue samples, subnetwork marker approaches have proven to be superior over single gene marker approaches, most importantly in cross-platform evaluation schemes. However, prior subnetwork-based approaches do not explicitly address the great phenotypical complexity of cancer.Results: We explicitly address this and employ density-constrained biclustering to compute subnetwork markers, which reflect pathways being dysregulated in many, but not necessarily all samples under consideration. In breast cancer we achieve substantial improvements over all cross-platform applicable approaches when predicting TP53 mutation status in a well-established non-cross-platform setting. In colon cancer, we raise prediction accuracy in the most difficult instances from 87% to 93% for cancer versus non−cancer and from 83% to (astonishing) 92%, for with versus without liver metastasis, in well-established cross-platform evaluation schemes.Availability: Software is available on request.Contact: alexsch@math.berkeley.edu; ester@cs.sfu.caSupplementary information: Supplementary data are available at Bioinformatics online.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.