No abstract
With widespread availability of graph-structured data from sources ranging from social networks to biochemical processes, there is increasing need for efficient and effective graph analyses techniques. Graphs with millions of vertices and beyond are commonplace, necessitating both efficient serial algorithms, as well as scalable parallel formulations. This paper addresses the problem of global graph alignment on supercomputer-class clusters. Given two graphs (or two instances of the same graph), we define graph alignment as a mapping of each vertex in the first graph to a unique vertex in the second graph so as to optimize a given similarity-based cost function 1 . Graph alignment is typically implemented in two steps -in the first step, a similarity matrix is computed. Entries in the matrix quantify similarity of node pairs, one chosen from each graph. In the second step, similar vertices are extracted through a bipartite matching algorithm applied to the similarity matrix. Using a state of the art serial algorithm for similarity matrix computation called Network Similarity Decomposition (NSD), we derive corresponding parallel formulations. Coupling this parallel similarity algorithm with a parallel auction-based bipartite matching technique, we derive a complete graph matching pipeline that is highly efficient and scalable. We validate the performance of our integrated approach on a large, supercomputer-class cluster and diverse graph instances (including Protein Interaction (PPI) networks, Web graphs, and Wikipedia link structures). Experimental results demonstrate that our algorithms scale to large machine configurations and problem instances. (Ananth Grama) 1 In the sequel, we'll be using the word alignment as a synonym for global graph alignment; this is in contrast to local graph alignment that permits a vertex to have different pairings in feasible local alignments, making it an inherently ambiguous process Specifically, we show that our integrated pipeline enables the alignment of networks of sizes two orders of magnitude larger than currently possible (millions of vertices, tens of millions of edges).
We present a PDE-constrained optimization algorithm which is designed for parallel scalability on distributed-memory architectures with thousands of cores. The method is based on a line-search interior-point algorithm for large-scale continuous optimization, it is matrix-free in that it does not require the factorization of derivative matrices. Instead, it uses a new parallel and robust iterative linear solver on distributed-memory architectures. We will show almost linear parallel scalability results for the complete optimization problem, which is a new emerging important biomedical application and is related to antenna identification in hyperthermia cancer treatment planning.
BackgroundGlobal network alignment has been proposed as an effective tool for computing functional orthology. Commonly used global alignment techniques such as IsoRank rely on a two-step process: the first step is an iterative diffusion-based approach for assigning similarity scores to all possible node pairs (matchings); the second step applies a maximum-weight bipartite matching algorithm to this similarity score matrix to identify orthologous node pairs. While demonstrably successful in identifying orthologies beyond those based on sequences, this two-step process is computationally expensive. Recent work on computation of node-pair similarity matrices has demonstrated that the computational cost of the first step can be significantly reduced. The use of these accelerated methods renders the bipartite matching step as the dominant computational cost. This motivates a critical assessment of the tradeoffs of computational cost and solution quality (matching quality, topological matches, and biological significance) associated with the bipartite matching step. In this paper we utilize the state-of-the-art core diffusion-based step in IsoRank for similarity matrix computation, and couple it with two heuristic bipartite matching algorithms – a matrix-based greedy approach, and a tunable, adaptive, auction-based matching algorithm developed by us. We then compare our implementations against the performance and quality characteristics of the solution produced by the reference IsoRank binary, which also implements an optimal matching algorithm.ResultsUsing heuristic matching algorithms in the IsoRank pipeline exhibits dramatic speedup improvements; typically ×30 times faster for the total alignment process in most cases of interest. More surprisingly, these improvements in compute times are typically accompanied by better or comparable topological and biological quality for the network alignments generated. These measures are quantified by the number of conserved edges in the alignment graph, the percentage of enriched components, and the total number of covered Gene Ontology (GO) terms.ConclusionsWe have demonstrated significant reductions in global network alignment computation times by coupling heuristic bipartite matching methods with the similarity scoring step of the IsoRank procedure. Our heuristic matching techniques maintain comparable – if not better – quality in resulting alignments. A consequence of our work is that network-alignment based orthologies can be computed within minutes (as compared to hours) on typical protein interaction networks, enabling a more comprehensive tuning of alignment parameters for refined orthologies.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.