Computational integrative analysis has become a significant approach in the data-driven exploration of biological problems. Many integration methods for cancer subtyping have been proposed, but evaluating these methods has become a complicated problem due to the lack of gold standards. Moreover, questions of practical importance remain to be addressed regarding the impact of selecting appropriate data types and combinations on the performance of integrative studies. Here, we constructed three classes of benchmarking datasets of nine cancers in TCGA by considering all the eleven combinations of four multi-omics data types. Using these datasets, we conducted a comprehensive evaluation of ten representative integration methods for cancer subtyping in terms of accuracy measured by combining both clustering accuracy and clinical significance, robustness, and computational efficiency. We subsequently investigated the influence of different omics data on cancer subtyping and the effectiveness of their combinations. Refuting the widely held intuition that incorporating more types of omics data always produces better results, our analyses showed that there are situations where integrating more omics data negatively impacts the performance of integration methods. Our analyses also suggested several effective combinations for most cancers under our studies, which may be of particular interest to researchers in omics data analysis.
Community or module detection is a fundamental problem in complex networks. Most of the traditional algorithms available focus only on vertices in a subgraph that are densely connected among themselves while being loosely connected to the vertices outside the subgraph, ignoring the topological structure of the community. However, in most cases one needs to make further analysis on the interior topological structure of communities to obtain various meaningful subgroups. We thus propose a novel community referred to as a cograph community, which has a well-understood structure. The well-understood structure of cographs and their corresponding cotree representation allows for an immediate identification of structurally-equivalent subgroups. We develop an algorithm called the Edge P 4 centrality-based divisive algorithm (EPCA) to detect these cograph communities; this algorithm is efficient, free of parameters and independent of additional measures mainly due to the novel local edge P 4 centrality measure. Further, we compare the EPCA with algorithms from the existing literature on synthetic, social and biological networks to show it has superior or competitive performance in accuracy. In addition to the computational advantages over other community-detection algorithms, the EPCA provides a simple means of discovering both dense and sparse subgroups based on structural equivalence or homogeneous roles which may otherwise go undetected by other algorithms which rely on edge density measures for finding subgroups.This approach has been used to attempt to detect structures such as the clique [5], quasi-clique [6,7], n-club, nclan, k-plex, etc [1] as the expected community or module structure in complex networks or to characterize the topological structures of communities based on statistical methods [8]. These algorithms can obtain specific graceful topological structures, but they suffer from prohibitive computational complexity due to the inherent combinatorial complexity of the prime graphs on large-scale practical complex networks. The familial groups in social networks proposed by Nastos and Gao [4] and their corresponding comparability tree arrangements of the groups are one example in which the structural definition of a community reveals much interior structure in the communities, but they also show that the computational problem of detecting these groups is NP-complete. It does, however, open new strategies for defining communities or modules by structural analyses.From the viewpoint of structural analyses, we consider not only the traditional macroscopic clustering property of communities being internally dense while being externally sparse but also the topological structure of the communities found. We propose a polynomial-time approach of network partitioning, called the EPCA, which detects connected cograph communities in a network. A graph (network) is a cograph when it excludes a specific subgraph configuration called a P 4 , defined in the next section. Cographs have attracted persistent attention lately...
Community detection has been extensively studied in the past decades largely because of the fact that community exists in various networks such as technological, social and biological networks. Most of the available algorithms, however, only focus on the properties of the vertices, ignoring the roles of the edges. To explore the roles of the edges in the networks for community discovery, the authors introduce the novel edge centrality based on its antitriangle property. To investigate how the edge centrality characterises the community structure, they develop an approach based on the edge antitriangle centrality with the isolated vertex handling strategy (EACH) for community detection. EACH first calculates the edge antitriangle centrality scores for all the edges of a given network and removes the edge with the highest score per iteration until the scores of the remaining edges are all zero. Furthermore, EACH is characterised by being free of the parameters and independent of any additional measures to determine the community structure. To demonstrate the effectiveness of EACH, they compare it with the state-of-the art algorithms on both the synthetic networks and the real world networks. The experimental results show that EACH is more accurate and has lower complexity in terms of community discovery and especially it can gain quite inherent and consistent communities with a maximal diameter of four jumps.
Cancer subtypes can improve our understanding of cancer, and suggest more precise treatment for patients. Multi-omics molecular data can characterize cancers at different levels. Up to now, many computational methods that integrate multi-omics data for cancer subtyping have been proposed. However, there are no consistent criteria to evaluate the integration methods due to the lack of gold standards (e.g., the number of subtypes in a specific cancer). Since comprehensive evaluation and comparison between different methods serves as a useful tool or guideline for users to select an optimal method for their own purpose, we develop a scalable platform, CEPICS, for comprehensively evaluating and comparing multi-omics data integration methods in cancer subtyping. Given a user-specified maximum number of subtypes, k-max, CEPICS provides (1) cancer subtyping results using up to five built-in state-of-the-art integration methods under the number of subtypes from two to k-max, (2) a report including the evaluation of each user-selected method and comparisons across them using clustering performance metrics and clinical survival analysis, and (3) an overall analysis of subtyping results by different methods representing a robust cancer subtype prediction for samples. Furthermore, users can upload subtyping results of their own methods to compare with the built-in methods. CEPICS is implemented as an R package and is freely available at .
As smoking rates decrease, proportionally more cases with lung adenocarcinoma occur in never-smokers, while aberrant DNA methylation has been suggested to contribute to the tumorigenesis of lung adenocarcinoma. It is extremely difficult to distinguish which genes play key roles in tumorigenic processes via DNA methylation-mediated gene silencing from a large number of differentially methylated genes. By integrating gene expression and DNA methylation data, a pipeline combined with the differential network analysis is designed to uncover driver methylation genes and responsive modules, which demonstrate distinctive expressions and network topology in tumors with aberrant DNA methylation. Totally, 135 genes are recognized as candidate driver genes in early stage lung adenocarcinoma and top ranked 30 genes are recognized as driver methylation genes. Functional annotation and the differential network analysis indicate the roles of identified driver genes in tumorigenesis, while literature study reveals significant correlations of the top 30 genes with early stage lung adenocarcinoma in never-smokers. The analysis pipeline can also be employed in identification of driver epigenetic events for other cancers characterized by matched gene expression data and DNA methylation data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.