BackgroundIn recent decades, detecting protein complexes (PCs) from protein-protein interaction networks (PPINs) has been an active area of research. There are a large number of excellent graph clustering methods that work very well for identifying PCs. However, most of existing methods usually overlook the inherent core-attachment organization of PCs. Therefore, these methods have three major limitations we should concern. Firstly, many methods have ignored the importance of selecting seed, especially without considering the impact of overlapping nodes as seed nodes. Thus, there may be false predictions. Secondly, PCs are generally supposed to be dense subgraphs. However, the subgraphs with high local modularity structure usually correspond to PCs. Thirdly, a number of available methods lack handling noise mechanism, and miss some peripheral proteins. In summary, all these challenging issues are very important for predicting more biological overlapping PCs.ResultsIn this paper, to overcome these weaknesses, we propose a clustering method by core-attachment and local modularity structure, named CALM, to detect overlapping PCs from weighted PPINs with noises. Firstly, we identify overlapping nodes and seed nodes. Secondly, for a node, we calculate the support function between a node and a cluster. In CALM, a cluster which initially consists of only a seed node, is extended by adding its direct neighboring nodes recursively according to the support function, until this cluster forms a locally optimal modularity subgraph. Thirdly, we repeat this process for the remaining seed nodes. Finally, merging and removing procedures are carried out to obtain final predicted clusters. The experimental results show that CALM outperforms other classical methods, and achieves ideal overall performance. Furthermore, CALM can match more complexes with a higher accuracy and provide a better one-to-one mapping with reference complexes in all test datasets. Additionally, CALM is robust against the high rate of noise PPIN.ConclusionsBy considering core-attachment and local modularity structure, CALM could detect PCs much more effectively than some representative methods. In short, CALM could potentially identify previous undiscovered overlapping PCs with various density and high modularity.Electronic supplementary materialThe online version of this article (10.1186/s12859-018-2309-9) contains supplementary material, which is available to authorized users.
Genome-wide association studies (GWAS) involve the detection and interpretation of epistasis, which is responsible for the 'missing heritability' and influences common complex disease susceptibility. Many epistasis detection algorithms cannot be directly applied into GWAS as many combinations of genetic components are present in only a small amount of samples or even none at all. For a huge number of single nucleotide polymorphisms and inappropriate statistical tests, epistasis detection remains a computational and statistical challenge in genetic epidemiology. Here, we develop a novel method to identify epistatic interactions related to disease susceptibility utilizing an ant colony optimization strategy implemented by Google's MapReduce platform. We incorporate expert knowledge used to guide ants to make the best choice in the search process into the pheromone updating rule. We conduct sufficient experiments using simulated and real genome-wide data sets and experimental results demonstrate excellent performance of our algorithm compared with its competitors.
Although genome-wide association studies play an increasingly important role in identifying causes of complex diseases, detecting SNP epistasis in these studies is a computational challenge. The existing methods are usually based on a single-correlation model between SNP combinations and phenotype and their performance is often unsatisfactory. The highest average power of the existing methods is 0.58 on DME models and 0.97 on DNME models. The highest average F-measure of the existing methods is 0.44 on DME models and 0.90 on DNME models. The lowest average computation time (second) of the existing methods is 2.12 on DME models and 2.09 on DNME models. In this work, a novel multi-objective evolutionary algorithm named SEE is presented for identifying SNP epistasis. In SEE, eight evolution objectives are successfully integrated to measure the association between SNP combinations and phenotype. SEE uses a novel evolutionary strategy based on sort, exploration and exploitation. SEE was compared with other existing methods using 72 simulated datasets. The average power of SEE is 0.71 with DME models and 0.99 with DNME models. The average F-measure of SEE is 0.68 with DME models and 0.99 with DNME models. The average computation time of SEE is 0.21 with DME models and 0.40 with DNME models. It is indicated that SEE outperforms other algorithms in both F-measure and computation time. It was then utilized to analyze real data obtained from the Wellcome Trust Case Control Consortium. Availability and Implementation: SEE is freely available at https://github.com/sunliyan0000/SEE.
Single Nucleotide Polymorphisms (SNPs) found in Genome-Wide Association Study (GWAS) mainly influence the susceptibility of complex diseases, but they still could not comprehensively explain the relationships between mutations and diseases. Interactions between SNPs are considered so important for deeply understanding of those relationships that several strategies have been proposed to explore such interactions. However, part of those methods perform poorly when marginal effects of disease loci are weak or absent, others may lack of considering high-order SNPs interactions, few methods have achieved the requirements in both performance and accuracy. Considering the above reasons, not only low-order, but also high-order SNP interactions as well as main-effect SNPs, should be taken into account in detection methods under an acceptable computational complexity. In this paper, a new pairwise (or low-order) interaction detection method IG (Interaction Gain) is introduced, in which disease models are not required and parallel computing is utilized. Furthermore, high-order SNP interactions were proposed to be detected by finding closely connected function modules of the network constructed from IG detection results. Tested by a wide range of simulated datasets and four WTCCC real datasets, the proposed methods accurately detected both low-order and high-order SNP interactions as well as disease-associated main-effect SNPS and it surpasses all competitors in performances. The research will advance complex diseases research by providing more reliable SNP interactions.
Background:: Genome-Wide Association Study (GWAS) plays a very important role in identifying the causes of a disease. Because most of the existing methods for genetic-interaction detection in GWAS are designed for a single-correlation model, their performances vary considerably for different disease models. These methods usually have high computation cost and low accuracy. Method:: We present a new multi-objective heuristic optimization methodology named HSMMGKG for detecting genetic interactions. In HS-MMGKG, we use harmony search with five objective functions to improve the efficiency and accuracy. A new strategy based on p-value and MDR is adopted to generate more reasonable results. The Boolean representation in BOOST is modified to calculate the five functions rapidly. These strategies take less time complexity and have higher accuracy while detecting the potential models. Results:: We compared HS-MMGKG with CSE, MACOED and FHSA-SED using 26 simulated datasets. The experimental results demonstrate that our method outperforms others in accuracy and computation time. Our method has identified many two-locus SNP combinations that are associated with seven diseases in WTCCC dataset. Some of the SNPs have direct evidence in CTD database. The results may be helpful to further explain the pathogenesis. Conclusion:: It is anticipated that our proposed algorithm could be used in GWAS which is helpful in understanding disease mechanism, diagnosis and prognosis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.