Fast and exact top-k search for random walk with restart

Fujiwara, Yasuhiro; Nakatsuji, Makoto; Onizuka, Makoto; Kitsuregawa, Masaru

doi:10.14778/2140436.2140441

Cited by 92 publications

(93 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Moreover, we also show that -these can be obtained highly efficiently, if necessary, leveraging existing approximation algorithms [2,4,14,17,21,23,41] and/or parallel implementations [3,32] for computing the PPR scores, -the proposed formulations are reuse-promoting in the sense that, it is possible to divide the work relative to individual seed nodes and cache the intermediary results obtained during the computation -these cached results can then be reused for future queries sharing seed nodes, and -especially in systems with large query throughputs, it may be possible to cluster queries based on the partial overlaps between the seed sets and, thus, significantly reduce the overall robust PPR computation costs.…”

Section: Our Contributions: Robust Personalized Pagerank (Rpr)mentioning

confidence: 85%

“…Alternatively, PowerIteration [27] or using iterative approximations [14,30], which explicitly simulate the dissemination of probability mass by repeatedly applying the transition process to an initial distribution π 0 until a convergence criterion is satisfied. Recent advances on PPR computation include top-k and approximate personalized PageRank algorithms [2,4,14,17,21,23,41] and parallelized implementations on MapReduce or Pregel based systems [3,32,36,38]. The FastRWR algorithm [41], for example partitions the graph into subgraphs and indexes partial intermediary solutions.…”

Section: Obtaining Pagerank and Personalized Pagerank Scoresmentioning

confidence: 99%

“…This is especially advantageous when G is large as we can leverage any of the highly effective approximation algorithms [2,4,14,17,21,23,41] or parallelized implementations [3,32] for computing these PPR scores. Most importantly, the first step of the algorithm (where we solve a linear equation independently for each seed node) can be trivially parallelized by assigning each node to a different computation unit.…”

Section: Converting the Problem Into A Set Of Linear Equationsmentioning

confidence: 99%

See 2 more Smart Citations

Reducing seed noise in personalized PageRank

Huang

Candan

et al. 2016

Soc. Netw. Anal. Min.

View full text Add to dashboard Cite

Network based recommendation systems leverage the topology of the underlying graph and the current user context to rank objects in the database. Random-walk based techniques, such as PageRank, encode the structure of the graph in the form of a transition matrix of a stochastic process from which the significances of the nodes in the graph are inferred. Personalized PageRank (PPR) techniques complement this with a seed node set which serves as the personalization context. In this paper, we note (and experimentally show) that PPR algorithms that do not differentiate among the seed nodes may not properly rank nodes in situations where the seed set is incomplete and/or noisy. To tackle this problem, we propose alternative robust personalized PageRank (RPR) strategies, which are insensitive to noise in the set of seed nodes and in which the rankings are not overly biased towards the seed nodes. In particular, we show that novel teleportation discounting and seed-set maximal PPR techniques help eliminate harmful bias of individual seed nodes and provide effective seed differentiation to lead to more accurate rankings. We also show that the proposed techniques lead to efficient implementations, where existing approximation algorithms and/or parallel implementations for computing the PPR scores can be easily leveraged. Moreover, the proposed formulations are reuse-promoting in the sense that, it is possible to divide the work relative to individual seed nodes and cache the intermediary results obtained during the computation, and especially in systems with large query throughputs, it may be possible to cluster queries based on the partial overlaps between the seed sets and reduce the overall robust PPR computation costs. Experiment results show that the proposed techniques are efficient and highly effective in improving recommendations and eliminating unwanted bias due to imperfections in the seed set.

show abstract

Section: Our Contributions: Robust Personalized Pagerank (Rpr)mentioning

confidence: 85%

Section: Obtaining Pagerank and Personalized Pagerank Scoresmentioning

confidence: 99%

Section: Converting the Problem Into A Set Of Linear Equationsmentioning

confidence: 99%

See 1 more Smart Citation

Reducing seed noise in personalized PageRank

Huang

Candan

et al. 2016

Soc. Netw. Anal. Min.

View full text Add to dashboard Cite

show abstract

“…RWR is a PageRank-like node proximity based on a random surfer model. In comparison with other relevance measures, RWR has the following two benefits [5]: (1) it can globally capture the entire topology of a graph; (2) its proximity values can be used for ranking objects with respects to a certain query, as opposed to PageRank that is query-independent.…”

Section: Introductionmentioning

confidence: 99%

“…Very recently, for top-K search, Fujiwara et al [1] has proposed an excellent algorithm called k-dash, which can be regarded as the state-of-the-art one for computing RWR. Unfortunately, their strategy involves a large LU matrix decomposition over an entire graph, which is still time-consuming.…”

Section: Introductionmentioning

confidence: 99%

Efficient Processing Node Proximity via Random Walk with Restart

Wang

et al. 2014

Web Technologies and Applications

View full text Add to dashboard Cite

Abstract. Graph is a useful tool to model complicated data structures. One important task in graph analysis is assessing node proximity based on graph topology. Recently, Random Walk with Restart (RWR) tends to pop up as a promising measure of node proximity, due to its proliferative applications in e.g. recommender systems, and image segmentation. However, the best-known algorithm for computing RWR resorts to a large LU matrix factorization on an entire graph, which is cost-inhibitive. In this paper, we propose hybrid techniques to efficiently compute RWR. First, a novel divide-and-conquer paradigm is designed, aiming to convert the large LU decomposition into small triangular matrix operations recursively on several partitioned subgraphs. Then, on every subgraph, a "sparse accelerator" is devised to further reduce the time of RWR without any sacrifice in accuracy. Our experimental results on real and synthetic datasets show that our approach outperforms the baseline algorithms by at least one constant factor without loss of exactness.

show abstract

A novel top‐k key node query problem in subgraph matching and its greedy strategy

Xue

2021

Engineering Reports

View full text Add to dashboard Cite

Top‐k node selection in graph data is an essential problem in computer science and applications. In view of an important issue in the field of graph data, subgraph matching issue, we define the problem and propose its method for the top‐k key node query w.r.t. the subgraph matching. Unlike the general top‐k query problem, we aim to find out k nodes that make the matching subgraphs in data graph G that are covered by the k nodes as more as possible. This is a problem of the maximum coverage of subgraph matching, which belongs to the NP‐hard problem. We study the problem based on a greedy algorithm and give an intuitive solution. Considering the characteristics of the top‐k problem, we propose an improved and more efficient greedy algorithm. Experiments on real social network graph data set (Twitter) show that the related results represent the key nodes that can better reveal the essential characteristics of the query graph in the data graph G. The key node query problem in subgraph matching proposed in this article may have extensive applications in reality, such as the assessment of the influence of specific group members in social network, the detection of abnormal communication in a computer communication network, the road traffic evaluation and load balance problem in a road traffic network, and so on.

show abstract

Fast and exact top-k search for random walk with restart

Cited by 92 publications

References 26 publications

Reducing seed noise in personalized PageRank

Reducing seed noise in personalized PageRank

Efficient Processing Node Proximity via Random Walk with Restart

A novel top‐k key node query problem in subgraph matching and its greedy strategy

Contact Info

Product

Resources

About