A labeled graph is a special structure with node identification capability, which is often used in information networks, biological networks, and other fields. The subgraph query is widely used as an important means of graph data analysis. As the size of the labeled graph increases and changes dynamically, users tend to focus on the high-match results that are of interest to them, and they want to take advantage of the relationship and number of results to get the results of the query quickly. For this reason, we consider the individual needs of users and propose a dynamic Top-K interesting subgraph query. This method establishes a novel graph topology feature index (GTSF index) including a node topology feature index (NTF index) and an edge feature index (EF index), which can effectively prune and filter the invalid nodes and edges that do not meet the restricted condition. The multi-factor candidate set filtering strategy is proposed based on the GTSF index, which can be further pruned to obtain fewer candidate sets. Then, we propose a dynamic Top-K interesting subgraph query method based on the idea of the sliding window to realize the dynamic modification of the matching results of the subgraph in the dynamic evolution of the label graph, to ensure real-time and accurate results of the query. In addition, considering the factors, such as frequent Input/Output (I/O) and network communication overheads, the optimization mechanism of the graph changes and an incremental maintenance strategy for the index are proposed to reduce the huge cost of redundant operation and global updates. The experimental results show that the proposed method can effectively deal with a dynamic Top-K interesting subgraph query on a large-scale labeled graph, at the same time the optimization mechanism of graph changes and the incremental maintenance strategy of the index can effectively reduce the maintenance overheads.
The development of knowledge graph needs the support of a vast quantity of data. However, the amount of data increases rapidly is placing increasing demands on machines. Centralized data storage requires high-performance hosts to store data, which is costly and have single point of failure. Distributed data storage can reduce the cost of the machine greatly, and there is no single point of failure, but it has requirements for partition and storage of data collection. In the knowledge storage of specific domain, the way of graph data partition and storage vary from the different domain knowledge. To solve the above problems, a scheme of graph partition and distributed storage for domain-specific knowledge graphs is proposed. The proposed graph partition scheme pays attention to the correlation between the data, and divides the nodes affiliated each other into the same or similar partition. A distributed aggregation storage scheme is designed, which makes full use of cluster performance and solves the problem of data consistency during data insertion and update. The proposed distributed storage scheme based on HBase combines Neo4j to realize visual query effectively. Experimental results show the efficiency and the effectiveness of the proposed method in partition time, the number of edge-cut and update time.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.