Finding a maximum coverage by k sets from a given collection (Max-k-Cover), finding a minimum number of sets with a required coverage (Partial-Cover) are both important combinatorial optimization problems. Various problems from data mining, machine learning, social network analysis, operational research, etc. can be generalized as a set coverage problem. The standard greedy algorithm is efficient as an in-memory algorithm. However, when we are facing very large-scale dataset or in an online environment, we seek a new algorithm which makes only one pass through the entire dataset.Previous one-pass algorithms for the Max-k-Cover problem cannot be extended to the Partial-Cover problem and do not enjoy the prefix-optimal property. In this paper, we propose a novel onepass streaming algorithm which produces a prefix-optimal ordering of sets, which can easily be used to solve the Max-k-Cover and the Partial-Cover problems. Our algorithm consumes space linear to the size of the universe of elements. The processing time for a set is linear to the size of this set. We also show with the aid of computer simulation that the approximation ratio of the Max-k-Cover problem is around 0.3. We conduct experiments on extensive datasets to compare our algorithm with existing one-pass algorithms on the Max-k-Cover problem, and with the standard greedy algorithm on the Partial-Cover problem. We demonstrate the efficiency and quality of our algorithm.
We study supergraph search (SPS), that is, given a query graph q and a graph database G that contains a collection of graphs , return graphs that have q as a supergraph from G. SPS has broad applications in bioinformatics, cheminformatics and other scientific and commercial fields. Determining whether a graph is a subgraph (or supergraph) of another is an NP-complete problem. Hence, it is intractable to compute SPS for large graph databases. Two separate indexing methods, a "filter + verify"-based method and a "prefixsharing"-based method, have been studied to efficiently compute SPS. To implement the above two methods, subgraph patterns are mined from the graph database to build an index. Those subgraphs are mined to optimize either the filtering gain or the prefix-sharing gain. However, no single subgraph-mining algorithm considers both gains.This work is the first one to mine subgraphs to optimize both the filtering gain and the prefix-sharing gain while processing SPS queries. First, we show that the subgraph-mining problem is NPhard. Then, we propose two polynomial-time algorithms to solve the problem with an approximation ratio of 1−1/e and 1/4 respectively. In addition, we construct a lattice-like index, LW-index, to organize the selected subgraph patterns for fast index-lookup. Our experiments show that our approach improves the query processing time for SPS queries by a factor of 3 to 10.
Abstract-Subgraph search is a useful and challenging query scenario for graph databases. Given a query graph q, a subgraph search algorithm returns all database graphs having q as a subgraph. To efficiently implement a subgraph search, subgraph features are mined in order to index the graph database. Many subgraph feature mining approaches have been proposed. They are all "mine-at-once" algorithms in which the whole feature set is mined in one run before building a stable graph index. However, due to the change of environments (such as an update of the graph database and the increase of available memory), the index needs to be updated to accommodate such changes. Most of the "mine-at-once" algorithms involve frequent subgraph or subtree mining over the whole graph database. Also, constructing and deploying a new index involves an expensive disk operation such that it is inefficient to re-mine the features and rebuild the index from scratch.We observe that, under most cases, it is sufficient to update a small part of the graph index. Here we propose an "iterative subgraph mining" algorithm which iteratively finds one feature to insert into (or remove from) the index. Since the majority of indexing features and the index structure are not changed, the algorithm can be frequently invoked. We define an objective function that guides the feature mining. Next, we propose a basic branch and bound algorithm to mine the features. Finally, we design an advanced search algorithm, which quickly finds a near-optimum subgraph feature and reduces the search space. Experiments show that our feature mining algorithm is 5 times faster than the popular graph indexing algorithm gIndex, and that features mined by our iterative algorithm have a better filtering rate for the subgraph search problem.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.